ci(vllm): use bigger-runner instead of source build

The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512 VNNI/BF16) that stock ubuntu-latest GitHub runners don't support — vllm.model_executor.models.registry SIGILLs on import during LoadModel. Source compilation works but takes 30-40 minutes per CI run, which is too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the bigger-runner self-hosted label (already used by backend.yml for the llama-cpp CUDA build) — that hardware has the required SIMD baseline and the prebuilt wheel runs cleanly. FROM_SOURCE=true is kept as an opt-in escape hatch: - install.sh still has the CPU source-build path for hosts that need it - backend/Dockerfile.python still declares the ARG + ENV - Makefile docker-build-backend still forwards the build-arg when set Default CI path uses the fast prebuilt wheel; source build can be re-enabled by exporting FROM_SOURCE=true in the environment.
2026-06-08 00:36:37 -04:00 · 2026-04-12 16:02:49 +00:00
parent 329df11989
commit ea2bbabffd
4 changed files with 19 additions and 21 deletions
--- a/backend/python/vllm/install.sh
+++ b/backend/python/vllm/install.sh
@@ -32,12 +32,12 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
 fi

-# When FROM_SOURCE=true on a CPU build, skip the prebuilt wheel in
-# requirements-cpu-after.txt and compile vllm locally against the host's
-# actual CPU. The prebuilt CPU wheels from vllm releases are compiled with
-# wider SIMD (AVX-512 VNNI/BF16 etc.) than some environments support — in
-# particular GitHub Actions runners SIGILL on the vllm model registry
-# subprocess. FROM_SOURCE=true avoids that at the cost of a longer install.
+# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
+# requirements-cpu-after.txt and compiles vllm locally against the host's
+# actual CPU. Not used by default because it takes ~30-40 minutes, but
+# kept here for hosts where the prebuilt wheel SIGILLs (CPU without the
+# required SIMD baseline, e.g. AVX-512 VNNI/BF16). Default CI uses a
+# bigger-runner with compatible hardware instead.
 if [ "x${BUILD_TYPE}" == "x" ] && [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
    # Temporarily hide the prebuilt wheel so installRequirements doesn't
    # pull it — the rest of the requirements files (base deps, torch,