mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-04 13:27:04 -04:00

Files

History

Richard Palethorpe 73aacad2f9 fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:

  ImportError: .../flash_attn_2_cuda...so: undefined symbol:
  _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.

vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.

Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Richard Palethorpe <io@richiejp.com>

2026-04-25 15:38:13 +00:00

..

backend.py

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

install.sh

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

Makefile

feat(mlx): add mlx backend (#6049 )

2025-08-22 08:42:29 +02:00

package.sh

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

README.md

refactor: move backends into the backends directory (#1279 )

2023-11-13 22:40:16 +01:00

requirements-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cpu.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-cublas12-after.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas12.txt

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )

2026-04-25 15:38:13 +00:00

requirements-cublas13-after.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-cublas13.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-hipblas-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-hipblas.txt

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

requirements-install.txt

feat: migrate python backends from conda to uv (#2215 )

2024-05-10 15:08:08 +02:00

requirements-intel-after.txt

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

requirements-intel.txt

feat(qwen-tts): add Qwen-tts backend (#8163 )

2026-01-23 15:18:41 +01:00

requirements-l4t13-after.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements-l4t13.txt

feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )

2026-04-25 12:26:29 +02:00

requirements.txt

chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/vllm (#9177 )

2026-03-31 10:10:17 +02:00

run.sh

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

test.py

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

test.sh

feat: Add backend gallery (#5607 )

2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm