mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-27 18:06:58 -04:00
feat(paged): Metal/darwin build availability for llama-cpp-localai-paged
Close the single build-targeting gap the cross-arch audit (ARCH_GENERALITY_AUDIT.md section 6, item 2) flagged: the paged backend had no Metal/darwin variant and no metal: capability key, so a Mac user selecting llama-cpp-localai-paged fell back to default=cpu (a Linux image) that does not run, with no fallthrough to stock llama-cpp. Mirror exactly how stock llama-cpp does darwin: - .github/backend-matrix.yml: add the includeDarwin row (-metal-darwin-arm64-llama-cpp-localai-paged, arch arm64, lang go) next to the stock llama-cpp darwin row. - backend/index.yaml: add the metal: capability key to the llama-cpp-localai-paged meta-backend plus the metal-llama-cpp-localai-paged and -development variant entries (URIs match the matrix tag-suffix); add Metal to tags. - scripts/build/llama-cpp-localai-paged-darwin.sh: new bespoke darwin build, a line-for-line mirror of llama-cpp-darwin.sh swapping the paged wrapper dir, binary names, ggml-shared-libs dir and output tar. Same CPU_ALL_VARIANTS + Metal path (GGML_METAL=ON via the reused llama-cpp Makefile when OS=Darwin; --target ggml pulls in ggml-metal via add_dependencies) with LLAMA_PAGED=on. - Makefile: add backends/llama-cpp-localai-paged-darwin target (+ .NOTPARALLEL). - .github/workflows/backend_build_darwin.yml: give the paged backend the same bespoke darwin build step as stock llama-cpp, share the llama ccache restore (save stays stock-only to avoid a same-run key collision), and exclude it from the generic build-darwin-go-backend step. - scripts/changed-backends.js: comment-only - the paged darwin path mapping was already present (forward-looking); update the stale "if a metal row is ever added" note now that the row exists. Metal delivers paged-KV only (NVFP4 FP4-MMA is CUDA/Blackwell-only); the GDN/conv fused ops have no Metal kernel, so a gated-DeltaNet (qwen35) model falls back to the CPU reference op at runtime - made SAFE by the fused-op backend gate (patch 0030). This is config; the Metal build runs in CI on the next push and is runtime-tested on the M4 Mac. Assisted-by: Claude:opus-4.8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
25
.github/workflows/backend_build_darwin.yml
vendored
25
.github/workflows/backend_build_darwin.yml
vendored
@@ -169,14 +169,14 @@ jobs:
|
||||
# invalidates cleanly; restore-keys fall back to the latest entry for the
|
||||
# same pin so unchanged TUs stay warm even when the cache is fresh.
|
||||
- name: Compute llama.cpp version
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
if: inputs.backend == 'llama-cpp' || inputs.backend == 'llama-cpp-localai-paged'
|
||||
id: llama-version
|
||||
run: |
|
||||
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
|
||||
echo "version=${version}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Restore ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
if: inputs.backend == 'llama-cpp' || inputs.backend == 'llama-cpp-localai-paged'
|
||||
id: ccache-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
@@ -186,7 +186,7 @@ jobs:
|
||||
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
|
||||
|
||||
- name: Configure ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
if: inputs.backend == 'llama-cpp' || inputs.backend == 'llama-cpp-localai-paged'
|
||||
run: |
|
||||
mkdir -p "$HOME/Library/Caches/ccache"
|
||||
ccache -M 2G
|
||||
@@ -230,6 +230,16 @@ jobs:
|
||||
make protogen-go
|
||||
make backends/llama-cpp-darwin
|
||||
|
||||
# llama-cpp-localai-paged reuses the same bespoke llama-cpp darwin build path
|
||||
# (CPU_ALL_VARIANTS + Metal + otool dylib bundling) via its own wrapper script,
|
||||
# so it gets a dedicated step like stock llama-cpp rather than the generic
|
||||
# build-darwin-go-backend mold.
|
||||
- name: Build ${{ inputs.backend }}-darwin (llama-cpp-localai-paged)
|
||||
if: inputs.backend == 'llama-cpp-localai-paged'
|
||||
run: |
|
||||
make protogen-go
|
||||
make backends/llama-cpp-localai-paged-darwin
|
||||
|
||||
- name: Build ds4 backend (Darwin Metal)
|
||||
if: inputs.backend == 'ds4'
|
||||
run: |
|
||||
@@ -245,15 +255,20 @@ jobs:
|
||||
make backends/privacy-filter-darwin
|
||||
|
||||
- name: Build ${{ inputs.backend }}-darwin
|
||||
if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
|
||||
if: inputs.backend != 'llama-cpp' && inputs.backend != 'llama-cpp-localai-paged' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
|
||||
run: |
|
||||
make protogen-go
|
||||
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
|
||||
|
||||
- name: ccache stats
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
if: inputs.backend == 'llama-cpp' || inputs.backend == 'llama-cpp-localai-paged'
|
||||
run: ccache -s
|
||||
|
||||
# Only stock llama-cpp persists the ccache: both backends share the same
|
||||
# ccache-llama-<arch>-<version>-<run_id> key, so the paged job restores from
|
||||
# the shared prefix (warm) but must NOT also save under the identical key in
|
||||
# the same run (it would collide). The shared upstream TUs stay warm via the
|
||||
# stock save; the paged-only patched TUs are a small recompile.
|
||||
- name: Save ccache
|
||||
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
|
||||
uses: actions/cache/save@v4
|
||||
|
||||
Reference in New Issue
Block a user