mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-29 11:37:40 -04:00
fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557)
The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:
ImportError: .../flash_attn_2_cuda...so: undefined symbol:
_ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib
That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.
vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.
Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
This commit is contained in:
committed by
GitHub
parent
806ea24ff4
commit
73aacad2f9
@@ -1,2 +1,9 @@
|
||||
https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
|
||||
# flash-attn wheels are ABI-tied to a specific torch version. vllm forces
|
||||
# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships
|
||||
# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently
|
||||
# broken when vllm upgrades torch during install, producing an undefined
|
||||
# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers
|
||||
# attention, and rotary_embedding/common.py guards the flash_attn import
|
||||
# with find_spec(), so skipping flash-attn is safe and the only stable
|
||||
# choice until upstream ships a torch-2.10 wheel.
|
||||
vllm
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
accelerate
|
||||
torch==2.7.0
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
Reference in New Issue
Block a user