mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-03 12:57:02 -04:00

Files

History

LocalAI [bot] ef15b4bfda fix(vllm): install ROCm vLLM from the AMD wheel index on Python 3.12 (#10651 )

* fix(vllm): install ROCm vLLM from the AMD wheel index on Python 3.12

The rocm-vllm backend crashed at load with "No module named 'vllm'".
requirements-hipblas-after.txt requested a bare `vllm`, which resolves to
the CUDA-only PyPI wheel; that wheel is unusable on an AMD GPU. vLLM's
prebuilt ROCm wheels live on a dedicated index (https://wheels.vllm.ai/rocm/)
and are published only for CPython 3.12, so on the backend's default 3.10
the installer silently falls back to the CUDA wheel.

Add a hipblas branch to backend/python/vllm/install.sh that pins Python to
3.12 and installs vllm from the ROCm wheel index, hiding the bare-`vllm`
after-file so installRequirements installs only the base ROCm
torch/transformers first and does not pull the CUDA wheel.

Fixes #10642

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(vllm): drop the dead hipblas-after requirement and its hide dance

requirements-hipblas-after.txt (a bare `vllm`) is never installed for
hipblas: installRequirements only adds requirements-${BUILD_PROFILE}-after.txt
when BUILD_TYPE != BUILD_PROFILE, and for hipblas they are equal. So the file
was dead and the install.sh hide/restore of it was a no-op. Remove both. The
hipblas branch already installs vllm explicitly from the ROCm wheel index, so
deleting the bare-`vllm` file also removes a latent CUDA-wheel trap should the
installRequirements gap ever be closed.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-07-03 00:44:55 +02:00

..

backend.py

fix(vllm): non-streaming tool-call regression after #10351 (#10638 )

2026-07-02 09:26:14 +02:00

install.sh

fix(vllm): install ROCm vLLM from the AMD wheel index on Python 3.12 (#10651 )

2026-07-03 00:44:55 +02:00

Makefile

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

package.sh

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

README.md

refactor: move backends into the backends directory (#1279 )

2023-11-13 22:40:16 +01:00

requirements-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cublas12-after.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas12.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas13-after.txt

chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.24.0 (#10618 )

2026-07-01 08:53:03 +02:00

requirements-cublas13.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-hipblas.txt

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

requirements-install.txt

fix(vllm): seed pybind11 for fastsafetensors build under --no-build-isolation

2026-04-28 20:08:26 +00:00

requirements-intel-after.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-intel.txt

feat(vllm, distributed): tensor parallel distributed workers (#9612 )

2026-05-06 00:22:50 +02:00

requirements-l4t13-after.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements-l4t13.txt

fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )

2026-05-22 23:01:22 +02:00

requirements.txt

chore(deps): bump grpcio from 1.81.0 to 1.81.1 in /backend/python/vllm (#10347 )

2026-06-15 22:57:38 +02:00

run.sh

fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )

2026-05-06 00:28:01 +02:00

test.py

feat(vllm): progressive streaming via parser.extract_tool_calls_streaming (follow-up to #10346 ) (#10351 )

2026-06-21 17:07:15 +02:00

test.sh

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm