mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-17 05:18:53 -04:00
ci(vllm): use bigger-runner instead of source build
The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512 VNNI/BF16) that stock ubuntu-latest GitHub runners don't support — vllm.model_executor.models.registry SIGILLs on import during LoadModel. Source compilation works but takes 30-40 minutes per CI run, which is too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the bigger-runner self-hosted label (already used by backend.yml for the llama-cpp CUDA build) — that hardware has the required SIMD baseline and the prebuilt wheel runs cleanly. FROM_SOURCE=true is kept as an opt-in escape hatch: - install.sh still has the CPU source-build path for hosts that need it - backend/Dockerfile.python still declares the ARG + ENV - Makefile docker-build-backend still forwards the build-arg when set Default CI path uses the fast prebuilt wheel; source build can be re-enabled by exporting FROM_SOURCE=true in the environment.
This commit is contained in:
@@ -195,8 +195,9 @@ COPY backend/backend.proto /${BACKEND}/backend.proto
|
||||
COPY backend/python/common/ /${BACKEND}/common
|
||||
COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
|
||||
|
||||
# Optional per-backend source build toggle (e.g. vllm on CPU needs to
|
||||
# compile against the host SIMD instead of using the prebuilt wheel).
|
||||
# Optional per-backend source build toggle (e.g. vllm on CPU can set
|
||||
# FROM_SOURCE=true to compile against the build host SIMD instead of
|
||||
# pulling a prebuilt wheel). Default empty — most backends ignore it.
|
||||
ARG FROM_SOURCE=""
|
||||
ENV FROM_SOURCE=${FROM_SOURCE}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user