ci(vllm): use bigger-runner instead of source build

The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512 VNNI/BF16) that stock ubuntu-latest GitHub runners don't support — vllm.model_executor.models.registry SIGILLs on import during LoadModel. Source compilation works but takes 30-40 minutes per CI run, which is too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the bigger-runner self-hosted label (already used by backend.yml for the llama-cpp CUDA build) — that hardware has the required SIMD baseline and the prebuilt wheel runs cleanly. FROM_SOURCE=true is kept as an opt-in escape hatch: - install.sh still has the CPU source-build path for hosts that need it - backend/Dockerfile.python still declares the ARG + ENV - Makefile docker-build-backend still forwards the build-arg when set Default CI path uses the fast prebuilt wheel; source build can be re-enabled by exporting FROM_SOURCE=true in the environment.
2026-04-17 13:28:31 -04:00 · 2026-04-12 16:02:49 +00:00
parent 329df11989
commit ea2bbabffd
4 changed files with 19 additions and 21 deletions
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -505,8 +505,12 @@ jobs:
  tests-vllm-grpc:
    needs: detect-changes
    if: needs.detect-changes.outputs.vllm == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    timeout-minutes: 120
+    # The prebuilt vllm CPU wheel is compiled with AVX-512 VNNI/BF16
+    # instructions; stock ubuntu-latest runners SIGILL on import of
+    # vllm.model_executor.models.registry. bigger-runner has newer
+    # hardware that supports the required SIMD.
+    runs-on: bigger-runner
+    timeout-minutes: 90
    steps:
      - name: Clone
        uses: actions/checkout@v6
@@ -521,12 +525,6 @@ jobs:
          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
          df -h
      - name: Build vllm (cpu) backend image and run gRPC e2e tests
-        env:
-          # GitHub Actions runners don't all support the SIMD instructions
-          # the prebuilt vllm CPU wheel was compiled against (SIGILL in
-          # vllm.model_executor.models.registry on import). Build vllm from
-          # source so it targets the actual CI CPU.
-          FROM_SOURCE: "true"
        run: |
          make test-extra-backend-vllm
  tests-acestep-cpp: