ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) (#9730)

* ci: add per-arch + manifest-merge support for LocalAI server image Mirror the backend_build.yml + backend_merge.yml pattern shipped in PR #9726 for the LocalAI server image: - image_build.yml accepts optional platform-tag (default ''), scopes registry cache to cache-localai<suffix>-<platform-tag>, and pushes by canonical digest only on push events. Digests upload as artifacts named digests-localai<suffix>-<platform-tag>, with a "-core" placeholder when tag-suffix is empty so the merge job's download pattern doesn't over-match across multiple suffixes. - image_merge.yml is a new reusable workflow that downloads matching digest artifacts and assembles the final tagged manifest list via docker buildx imagetools create. Image names differ from backend_*.yml: the LocalAI server is published under quay.io/go-skynet/local-ai and localai/localai (not -backends). Not yet wired into image.yml / image-pr.yml — Commit C does that. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: fan out per-arch split to remaining 34 backends Convert all remaining linux/amd64,linux/arm64 entries in backend-matrix.yml to per-arch + manifest-merge form. Each was a single matrix entry running both arches on x86 under QEMU emulation; each becomes two entries — amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). Four backends that were on bigger-runner (-cpu-llama-cpp, -cpu-turboquant, -gpu-vulkan-llama-cpp, -gpu-vulkan-turboquant) have both legs moved to free tier as part of the same change. They are compile-only (no torch/CUDA install) and fit comfortably with the setup-build-disk /mnt relocation. Phase 4 (next commit) retires the remaining 5 single-arch bigger-runner entries. After this commit: - 271 total matrix entries (was 237) - 0 multi-arch entries left - 36 per-arch pairs (34 new + 2 pilots from PR #9727) - 5 bigger-runner entries remaining (single-arch, Phase 4 target) Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: split LocalAI image multi-arch entries per arch + merge Mirror the backend per-arch split for the main LocalAI image: - image.yml's core-image-build matrix: split the core ('') and -gpu-vulkan entries into amd64 + arm64 legs each. amd64 on ubuntu-latest, arm64 on ubuntu-24.04-arm (native). - New top-level core-image-merge and gpu-vulkan-image-merge jobs call image_merge.yml after core-image-build completes. - image-pr.yml's image-build matrix: split the -vulkan-core entry. No merge job added on the PR side — image_build.yml's digest-push is push-only-event-gated, so a PR-side merge would have nothing to download. After this commit, no workflow file references linux/amd64,linux/arm64 in a single matrix slot. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: retire bigger-runner from backend matrix (Phase 4) Migrate the remaining 5 single-arch bigger-runner entries to ubuntu-latest. Combined with the Phase 3 setup-build-disk /mnt relocation (PR #9726), free-tier ubuntu-latest now has ~100 GB of working space — enough for ROCm dev image (~16 GB), CUDA toolkit (~5 GB), and the per-backend compile/install steps these entries do. Backends migrated: - -gpu-nvidia-cuda-12-llama-cpp - -gpu-nvidia-cuda-12-turboquant - -gpu-rocm-hipblas-faster-whisper - -gpu-rocm-hipblas-coqui - -cpu-ik-llama-cpp After this commit, .github/backend-matrix.yml has zero bigger-runner references. The bigger-runner used in tests-vibevoice-cpp-grpc- transcription (test-extra.yml) is a separate concern handled in a follow-up. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 9 Intel oneAPI backends to free tier (Phase 5.1) Intel oneAPI base image is ~6 GB; each backend's wheel install stays well within the ~100 GB working space provided by Phase 3's setup-build-disk /mnt relocation. Lowest-risk batch of the arc-runner-set retirement. Backends migrated: vllm, sglang, vibevoice, qwen-asr, nemo, qwen-tts, fish-speech, voxcpm, pocket-tts (all -gpu-intel-* variants). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 15 ROCm Python backends to free tier (Phase 5.2) ROCm dev image (~16 GB) plus per-backend torch/wheels install fits on ubuntu-latest with the /mnt-relocated Docker root. These entries include the heavier vLLM/sglang/transformers/diffusers stack on ROCm; if any specific backend OOMs or runs out of disk, individual flips back to arc-runner-set are revertable per-entry. Backends migrated: all 15 -gpu-rocm-hipblas-* entries previously on arc-runner-set (vllm/vllm-omni/sglang/transformers/diffusers/ ace-step/kokoro/vibevoice/qwen-asr/nemo/qwen-tts/fish-speech/ voxcpm/pocket-tts/neutts). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: migrate 6 CUDA Python backends to free tier (Phase 5.3) vLLM/sglang stacks on CUDA 12 and CUDA 13 are the heaviest backends in the matrix — flash-attn intermediate layers can spike disk usage during build. setup-build-disk's /mnt relocation gives ~100 GB working space which fits the documented peak. Highest-risk batch of the arc-runner-set retirement; if any backend fails to build on free tier, the per-entry runs-on flip is the unit of revert. Backends migrated: -gpu-nvidia-cuda-{12,13}-{vllm,vllm-omni,sglang}. After this commit, .github/backend-matrix.yml has zero references to arc-runner-set or bigger-runner. The migration is complete. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: disable provenance on multi-registry digest pushes Root-caused on master via PR #9727's pilot: when docker/build-push-action@v7 pushes a single build to TWO registries simultaneously with push-by-digest=true, buildx generates a per-registry provenance attestation manifest (because mode=max — the default for push:true — includes the runner ID). That makes the resulting manifest-list digest diverge across registries: arm64 -cpu-faster-whisper build: image manifest: sha256:d3bdd34b... (identical, content-only) quay manifest list: sha256:66b4cfc8... (with quay attestation) dockerhub manifest list: sha256:e0733c3b... (with dockerhub attestation) steps.build.outputs.digest returns only one of the list digests (empirically the dockerhub one). The merge job then asks "quay.io/...@sha256:e0733c3b..." which doesn't exist on quay — that list has digest 66b4cfc8 there. Result: imagetools create fails with "not found" and the merge job fails (run 25581983094, job 75110021491). Setting provenance: false drops the per-registry attestation; the manifest-list digest becomes pure content, identical across both registries, and steps.build.outputs.digest works on either lookup. Applied to backend_build.yml and image_build.yml — both refactored to use the same multi-registry digest-push pattern in the prior PRs. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-17 04:56:52 -04:00 · 2026-05-09 09:37:00 +02:00
parent fe7b27eb66
commit f0374aa0e8
6 changed files with 821 additions and 82 deletions
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -198,6 +198,16 @@ jobs:
          outputs: |
            type=image,name=quay.io/go-skynet/local-ai-backends,push-by-digest=true,name-canonical=true,push=true
            type=image,name=localai/localai-backends,push-by-digest=true,name-canonical=true,push=true
+          # Disable provenance: with mode=max (the default for push:true)
+          # buildx bundles a per-registry attestation manifest into each
+          # registry's manifest list, which makes the resulting list digest
+          # diverge across registries. steps.build.outputs.digest then
+          # only matches one of them, and the merge job's
+          # `imagetools create <reg>@sha256:<digest>` lookup fails on the
+          # other. Disabling provenance keeps the digest content-only and
+          # identical across both registries — required for digest-based
+          # cross-registry merge.
+          provenance: false
          labels: ${{ steps.meta.outputs.labels }}

      - name: Export digest
--- a/.github/workflows/image-pr.yml
+++ b/.github/workflows/image-pr.yml
@@ -18,6 +18,7 @@
        cuda-major-version: ${{ matrix.cuda-major-version }}
        cuda-minor-version: ${{ matrix.cuda-minor-version }}
        platforms: ${{ matrix.platforms }}
+        platform-tag: ${{ matrix.platform-tag || '' }}
        runs-on: ${{ matrix.runs-on }}
        base-image: ${{ matrix.base-image }}
        makeflags: ${{ matrix.makeflags }}
@@ -71,13 +72,23 @@
              makeflags: "--jobs=3 --output-sync=target"
              ubuntu-version: '2404'
            - build-type: 'vulkan'
-              platforms: 'linux/amd64,linux/arm64'
+              platforms: 'linux/amd64'
+              platform-tag: 'amd64'
              tag-latest: 'false'
              tag-suffix: '-vulkan-core'
              runs-on: 'ubuntu-latest'
              base-image: "ubuntu:24.04"
              makeflags: "--jobs=4 --output-sync=target"
              ubuntu-version: '2404'
+            - build-type: 'vulkan'
+              platforms: 'linux/arm64'
+              platform-tag: 'arm64'
+              tag-latest: 'false'
+              tag-suffix: '-vulkan-core'
+              runs-on: 'ubuntu-24.04-arm'
+              base-image: "ubuntu:24.04"
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
            - build-type: 'cublas'
              cuda-major-version: "13"
              cuda-minor-version: "0"
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@@ -56,6 +56,7 @@
        cuda-major-version: ${{ matrix.cuda-major-version }}
        cuda-minor-version: ${{ matrix.cuda-minor-version }}
        platforms: ${{ matrix.platforms }}
+        platform-tag: ${{ matrix.platform-tag || '' }}
        runs-on: ${{ matrix.runs-on }}
        base-image: ${{ matrix.base-image }}
        makeflags: ${{ matrix.makeflags }}
@@ -72,7 +73,8 @@
        matrix:
          include:
            - build-type: ''
-              platforms: 'linux/amd64,linux/arm64'
+              platforms: 'linux/amd64'
+              platform-tag: 'amd64'
              tag-latest: 'auto'
              tag-suffix: ''
              base-image: "ubuntu:24.04"
@@ -81,6 +83,17 @@
              skip-drivers: 'false'
              ubuntu-version: '2404'
              ubuntu-codename: 'noble'
+            - build-type: ''
+              platforms: 'linux/arm64'
+              platform-tag: 'arm64'
+              tag-latest: 'auto'
+              tag-suffix: ''
+              base-image: "ubuntu:24.04"
+              runs-on: 'ubuntu-24.04-arm'
+              makeflags: "--jobs=4 --output-sync=target"
+              skip-drivers: 'false'
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
            - build-type: 'cublas'
              cuda-major-version: "12"
              cuda-minor-version: "8"
@@ -106,7 +119,8 @@
              ubuntu-version: '2404'
              ubuntu-codename: 'noble'
            - build-type: 'vulkan'
-              platforms: 'linux/amd64,linux/arm64'
+              platforms: 'linux/amd64'
+              platform-tag: 'amd64'
              tag-latest: 'auto'
              tag-suffix: '-gpu-vulkan'
              runs-on: 'ubuntu-latest'
@@ -115,6 +129,17 @@
              makeflags: "--jobs=4 --output-sync=target"
              ubuntu-version: '2404'
              ubuntu-codename: 'noble'
+            - build-type: 'vulkan'
+              platforms: 'linux/arm64'
+              platform-tag: 'arm64'
+              tag-latest: 'auto'
+              tag-suffix: '-gpu-vulkan'
+              runs-on: 'ubuntu-24.04-arm'
+              base-image: "ubuntu:24.04"
+              skip-drivers: 'false'
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
            - build-type: 'intel'
              platforms: 'linux/amd64'
              tag-latest: 'auto'
@@ -124,6 +149,32 @@
              makeflags: "--jobs=3 --output-sync=target"
              ubuntu-version: '2404'
              ubuntu-codename: 'noble'
+
+    core-image-merge:
+      if: github.repository == 'mudler/LocalAI'
+      needs: core-image-build
+      uses: ./.github/workflows/image_merge.yml
+      with:
+        tag-latest: 'auto'
+        tag-suffix: ''
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+
+    gpu-vulkan-image-merge:
+      if: github.repository == 'mudler/LocalAI'
+      needs: core-image-build
+      uses: ./.github/workflows/image_merge.yml
+      with:
+        tag-latest: 'auto'
+        tag-suffix: '-gpu-vulkan'
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
  
    gh-runner:
      if: github.repository == 'mudler/LocalAI'
--- a/.github/workflows/image_build.yml
+++ b/.github/workflows/image_build.yml
@@ -24,6 +24,15 @@ on:
        description: 'Platforms'
        default: ''
        type: string
+      platform-tag:
+        description: |
+          Short tag identifying the platform leg, e.g. "amd64" or "arm64".
+          Used to scope the per-arch registry cache and the digest artifact name.
+          Optional during the migration; will be flipped to required: true once
+          every caller passes an explicit value.
+        required: false
+        default: ''
+        type: string
      tag-latest:
        description: 'Tag latest'
        default: ''
@@ -138,7 +147,8 @@ jobs:
          username: ${{ secrets.quayUsername }}
          password: ${{ secrets.quayPassword }}

-      - name: Build and push
+      - name: Build and push by digest
+        id: build
        uses: docker/build-push-action@v7
        if: github.event_name != 'pull_request'
        with:
@@ -156,12 +166,33 @@ jobs:
            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
          context: .
          file: ./Dockerfile
-          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
-          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}-${{ inputs.platform-tag }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}-${{ inputs.platform-tag }},mode=max,ignore-error=true
          platforms: ${{ inputs.platforms }}
-          push: ${{ github.event_name != 'pull_request' }}
-          tags: ${{ steps.meta.outputs.tags }}
+          outputs: |
+            type=image,name=quay.io/go-skynet/local-ai,push-by-digest=true,name-canonical=true,push=true
+            type=image,name=localai/localai,push-by-digest=true,name-canonical=true,push=true
+          # See backend_build.yml for the rationale — provenance=mode=max
+          # diverges the manifest-list digest per registry, breaking the
+          # downstream imagetools create lookup.
+          provenance: false
          labels: ${{ steps.meta.outputs.labels }}
+
+      - name: Export digest
+        if: github.event_name != 'pull_request'
+        run: |
+          mkdir -p /tmp/digests
+          digest="${{ steps.build.outputs.digest }}"
+          touch "/tmp/digests/${digest#sha256:}"
+
+      - name: Upload digest artifact
+        if: github.event_name != 'pull_request'
+        uses: actions/upload-artifact@v4
+        with:
+          name: digests-localai${{ inputs.tag-suffix == '' && '-core' || inputs.tag-suffix }}-${{ inputs.platform-tag }}
+          path: /tmp/digests/*
+          if-no-files-found: error
+          retention-days: 1
 ### Start testing image
      - name: Build and push
        uses: docker/build-push-action@v7
@@ -181,7 +212,7 @@ jobs:
            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
          context: .
          file: ./Dockerfile
-          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}-${{ inputs.platform-tag }}
          platforms: ${{ inputs.platforms }}
          #push: true
          tags: ${{ steps.meta_pull_request.outputs.tags }}
--- a/.github/workflows/image_merge.yml
+++ b/.github/workflows/image_merge.yml
@@ -0,0 +1,117 @@
+---
+name: 'merge LocalAI image manifest list (reusable)'
+
+# Reusable workflow that joins per-arch digest artifacts (uploaded by
+# image_build.yml when called with platform-tag) into a single tagged
+# multi-arch manifest list.
+
+on:
+  workflow_call:
+    inputs:
+      tag-latest:
+        description: 'Whether the manifest list should also be tagged latest (auto/false/true)'
+        required: false
+        type: string
+        default: ''
+      tag-suffix:
+        description: 'Image tag suffix (empty for core image). Used in artifact pattern with a -core placeholder for empty.'
+        required: true
+        type: string
+    secrets:
+      dockerUsername:
+        required: false
+      dockerPassword:
+        required: false
+      quayUsername:
+        required: true
+      quayPassword:
+        required: true
+
+jobs:
+  merge:
+    runs-on: ubuntu-latest
+    env:
+      quay_username: ${{ secrets.quayUsername }}
+    steps:
+      - name: Download digests
+        uses: actions/download-artifact@v4
+        with:
+          pattern: digests-localai${{ inputs.tag-suffix == '' && '-core' || inputs.tag-suffix }}-*
+          merge-multiple: true
+          path: /tmp/digests
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@master
+
+      - name: Login to DockerHub
+        if: github.event_name != 'pull_request'
+        uses: docker/login-action@v4
+        with:
+          username: ${{ secrets.dockerUsername }}
+          password: ${{ secrets.dockerPassword }}
+
+      - name: Login to Quay.io
+        uses: docker/login-action@v4
+        with:
+          registry: quay.io
+          username: ${{ secrets.quayUsername }}
+          password: ${{ secrets.quayPassword }}
+
+      - name: Docker meta
+        id: meta
+        uses: docker/metadata-action@v6
+        with:
+          images: |
+            quay.io/go-skynet/local-ai
+            localai/localai
+          tags: |
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
+            type=sha
+          flavor: |
+            latest=${{ inputs.tag-latest }}
+            suffix=${{ inputs.tag-suffix }},onlatest=true
+
+      - name: Create manifest list and push (quay)
+        working-directory: /tmp/digests
+        run: |
+          set -euo pipefail
+          tags=$(jq -cr '.tags | map(select(startswith("quay.io/"))) | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON")
+          if [ -z "$tags" ]; then
+            echo "No quay.io tags from docker/metadata-action; skipping quay merge"
+          else
+            # shellcheck disable=SC2086
+            docker buildx imagetools create $tags \
+              $(printf 'quay.io/go-skynet/local-ai@sha256:%s ' *)
+          fi
+
+      - name: Create manifest list and push (dockerhub)
+        if: github.event_name != 'pull_request'
+        working-directory: /tmp/digests
+        run: |
+          set -euo pipefail
+          tags=$(jq -cr '.tags | map(select(startswith("localai/"))) | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON")
+          if [ -z "$tags" ]; then
+            echo "No dockerhub tags from docker/metadata-action; skipping dockerhub merge"
+          else
+            # shellcheck disable=SC2086
+            docker buildx imagetools create $tags \
+              $(printf 'localai/localai@sha256:%s ' *)
+          fi
+
+      - name: Inspect manifest
+        run: |
+          set -euo pipefail
+          first_tag=$(jq -cr '.tags[0]' <<< "$DOCKER_METADATA_OUTPUT_JSON")
+          if [ -n "$first_tag" ] && [ "$first_tag" != "null" ]; then
+            docker buildx imagetools inspect "$first_tag"
+          fi
+
+      - name: Job summary
+        run: |
+          set -euo pipefail
+          echo "Merged manifest tags:" >> "$GITHUB_STEP_SUMMARY"
+          jq -r '.tags[]' <<< "$DOCKER_METADATA_OUTPUT_JSON" | sed 's/^/- /' >> "$GITHUB_STEP_SUMMARY"
+          echo >> "$GITHUB_STEP_SUMMARY"
+          echo "Per-arch digests:" >> "$GITHUB_STEP_SUMMARY"
+          ls -1 /tmp/digests | sed 's/^/- sha256:/' >> "$GITHUB_STEP_SUMMARY"