chore(deps): bump mxschmitt/action-tmate from 3.23 to 3.24

Bumps [mxschmitt/action-tmate](https://github.com/mxschmitt/action-tmate) from 3.23 to 3.24. - [Release notes](https://github.com/mxschmitt/action-tmate/releases) - [Changelog](https://github.com/mxschmitt/action-tmate/blob/master/RELEASE.md) - [Commits](https://github.com/mxschmitt/action-tmate/compare/v3.23...v3.24) --- updated-dependencies: - dependency-name: mxschmitt/action-tmate dependency-version: '3.24' dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
2026-06-26 01:16:58 -04:00 · 2026-06-22 19:42:05 +00:00
193 changed files with 1282 additions and 4206 deletions
--- a/.agents/adding-backends.md
+++ b/.agents/adding-backends.md
@@ -102,24 +102,6 @@ Multi-arch backends are NOT a single matrix entry with `platforms: 'linux/amd64,

 Entries whose `dockerfile` is `./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` must also set a `builder-base-image` field pointing at a prebuilt base from `quay.io/go-skynet/ci-cache:base-grpc-*` (CI builds these via `.github/workflows/base-images.yml`). The mapping is by `(build-type, platforms)` — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~25–35 min cold). Local `make backends/<name>` ignores `builder-base-image` and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds.

-### Cover every OS the project supports (Linux **and** Darwin)
-
-`.github/backend-matrix.yml` has two matrices, and they are the source of truth for which OS a backend ships on:
-
- `include:` — the **Linux** matrix (x86_64 + arm64; CPU and CUDA / ROCm / SYCL / Vulkan).
- `includeDarwin:` — the **macOS / Apple Silicon** matrix (arm64; Metal where the engine supports it, otherwise a native arm64 CPU build).
-
-**A new backend must target every OS it can build for — do not ship Linux-only by default.** A backend that appears only under `include:` is silently unavailable on macOS even when its code would run there. Most C/C++/GGML engines build on Darwin out of the box (ggml defaults `GGML_METAL=ON` on Apple, so a plain build is Metal-enabled), and many Python backends do too (CPU / MPS wheels). If a backend genuinely cannot support an OS (e.g. CUDA-only, no CPU variant), state that in the PR description instead of omitting it silently.
-
-Wiring a backend into `includeDarwin:` is more than the matrix entry:
-
-1. **`includeDarwin:` entry** — `tag-suffix: "-metal-darwin-arm64-<backend>"`, `build-type: "metal"`, `lang: "go"` for go+ggml backends; omit `build-type` for the bespoke C++ ones (llama-cpp / ds4 / privacy-filter). Match an existing entry of the same shape.
-2. **`backend/index.yaml`** — add `metal:` to the backend's `capabilities` map (main and `-development`) and concrete `metal-<backend>` / `metal-<backend>-development` image entries pointing at the `-metal-darwin-arm64-<backend>` images.
-3. **C/C++ backends only** — add an `inferBackendPathDarwin` case in `scripts/changed-backends.js` returning `backend/cpp/<backend>/` (the generic fallthrough assumes `backend/<lang>/`, which is wrong for a C++ source tree driven with `lang: go`), and give `run.sh` a Darwin branch that exports `DYLD_LIBRARY_PATH` instead of `LD_LIBRARY_PATH`. If the build is bespoke (single `grpc-server` + dylib bundling), model it on `scripts/build/ds4-darwin.sh` and add a `backends/<backend>-darwin` make target plus a gated step in `.github/workflows/backend_build_darwin.yml`.
-4. **C++ proto gotcha** — if the backend compiles the generated gRPC/protobuf in a separate CMake target (e.g. `hw_grpc_proto`), that target must link `protobuf::libprotobuf` + `gRPC::grpc++` so the Homebrew include dirs propagate; otherwise macOS fails with `google/protobuf/runtime_version.h not found` (Linux hides this because apt headers sit in `/usr/include`).
-
-The CI path filter only builds a backend on a PR when a file under its directory changes, so a darwin-only YAML edit builds nothing — touch a file under `backend/<lang>/<backend>/` (a one-line comment is enough) in the same PR.
-
 ## 3. Add Backend Metadata to `backend/index.yaml`

 **Step 3a: Add Meta Definition**
@@ -243,7 +225,6 @@ After adding a new backend, verify:

 - [ ] Backend directory structure is complete with all necessary files
 - [ ] Build configurations added to `.github/backend-matrix.yml` for all desired platforms (per-arch entries with `platform-tag` for multi-arch; `builder-base-image` for llama-cpp / ik-llama-cpp / turboquant)
- [ ] **OS coverage considered**: added to `includeDarwin:` (macOS/Apple Silicon) if the backend can build there — with the `backend/index.yaml` `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD branch and `inferBackendPathDarwin` case for C++ backends — or the PR explains why an OS is unsupported. Do not ship Linux-only by default.
 - [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
 - [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
 - [ ] Tag suffixes match between workflow file and index.yaml
--- a/.docker/llama-cpp-compile.sh
+++ b/.docker/llama-cpp-compile.sh
@@ -17,29 +17,19 @@ if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
  rm -rf /LocalAI/backend/cpp/llama-cpp-*-build
 fi

-cd /LocalAI/backend/cpp/llama-cpp
-if [ -z "${BUILD_TYPE:-}" ]; then
-  # Pure CPU image (BUILD_TYPE empty): one build with ggml CPU_ALL_VARIANTS replaces the
-  # per-microarch binaries (x86: avx/avx2/avx512/fallback; arm64: armv8.x/armv9.x). ggml
-  # dlopens the best libggml-cpu-*.so at runtime by probing host CPU features.
-  #
-  # arm64: the CPU_ALL_VARIANTS table includes armv9.2 SME variants whose -march=...+sme is
-  # rejected by the Ubuntu 24.04 default gcc-13. gcc-14 accepts it, so build the arm64
-  # variants with it (the host never *selects* SME unless it has it, but every variant must
-  # still compile).
-  if [ "${TARGETARCH}" = "arm64" ]; then
-    apt-get update -qq && apt-get install -y -qq gcc-14 g++-14
-    export CC=gcc-14 CXX=g++-14
-  fi
-  make llama-cpp-cpu-all
-else
-  # GPU build (cublas/hipblas/sycl/vulkan/...): the accelerator does the compute, so a
-  # single fallback CPU build is enough - no per-microarch CPU variants needed. (This also
-  # keeps the heavy GPU backend compile from also building the whole CPU variant matrix,
-  # and avoids the gcc-14 apt step on GPU base images such as nvidia l4t.)
+if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
+  cd /LocalAI/backend/cpp/llama-cpp
  make llama-cpp-fallback
+  make llama-cpp-grpc
+  make llama-cpp-rpc-server
+else
+  cd /LocalAI/backend/cpp/llama-cpp
+  make llama-cpp-avx
+  make llama-cpp-avx2
+  make llama-cpp-avx512
+  make llama-cpp-fallback
+  make llama-cpp-grpc
+  make llama-cpp-rpc-server
 fi
-make llama-cpp-grpc
-make llama-cpp-rpc-server

 ccache -s || true
--- a/.docker/turboquant-compile.sh
+++ b/.docker/turboquant-compile.sh
@@ -19,21 +19,17 @@ fi

 cd /LocalAI/backend/cpp/turboquant

-if [ -z "${BUILD_TYPE:-}" ]; then
-  # Pure CPU image: one ggml CPU_ALL_VARIANTS build replaces the per-microarch binaries.
-  # arm64: the armv9.2 SME variants need gcc-14 (gcc-13 rejects +sme).
-  if [ "${TARGETARCH}" = "arm64" ]; then
-    apt-get update -qq && apt-get install -y -qq gcc-14 g++-14
-    export CC=gcc-14 CXX=g++-14
-  fi
-  make turboquant-cpu-all
-else
-  # GPU build (cublas/hipblas/sycl/vulkan/...): single fallback CPU build, the accelerator
-  # does the compute. Keeps the GPU compile from also building the CPU variant matrix and
-  # avoids the gcc-14 apt step on GPU base images such as nvidia l4t.
+if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
  make turboquant-fallback
+  make turboquant-grpc
+  make turboquant-rpc-server
+else
+  make turboquant-avx
+  make turboquant-avx2
+  make turboquant-avx512
+  make turboquant-fallback
+  make turboquant-grpc
+  make turboquant-rpc-server
 fi
-make turboquant-grpc
-make turboquant-rpc-server

 ccache -s || true
--- a/.github/backend-matrix.yml
+++ b/.github/backend-matrix.yml
@@ -2,28 +2,6 @@
 # Matrix data for backend container image builds.
 # Consumed by scripts/changed-backends.js for both backend.yml and backend_pr.yml.
 # This file is NOT a workflow — it has no top-level 'on:' or 'jobs:'.
-#
-# OS / platform coverage — READ THIS WHEN ADDING A BACKEND
-# --------------------------------------------------------
-# This file is the source of truth for which OS each backend is built and
-# published for. A backend ships ONLY for the matrices it appears in:
-#   - Linux  -> the `include:` matrix below (x86_64 + arm64; CPU and
-#               CUDA / ROCm / SYCL / Vulkan variants).
-#   - macOS  -> the `includeDarwin:` matrix (Apple Silicon / arm64; Metal where
-#               the engine supports it, otherwise a native arm64 CPU build).
-#
-# New backends must target EVERY OS they can build for, not just Linux. A backend
-# listed only under `include:` is silently unavailable on macOS even when its code
-# would run there. Most C/C++/GGML engines build on Darwin (ggml defaults
-# GGML_METAL=ON on Apple, so a plain build is Metal-enabled), and many Python
-# backends do too (CPU / MPS). If a backend genuinely cannot support an OS, say so
-# in its PR description rather than silently omitting it.
-#
-# Adding a backend to `includeDarwin:` is more than one line — see the darwin
-# checklist in .agents/adding-backends.md (includeDarwin entry, the index.yaml
-# `metal:` capability + `metal-<backend>` image entries, a `run.sh` Darwin/DYLD
-# branch for C/C++ backends, and the inferBackendPathDarwin case in
-# scripts/changed-backends.js so the path filter actually builds it).

 # Linux matrix (consumed by backend-jobs).
 include:
@@ -4944,37 +4922,6 @@ includeDarwin:
    tag-suffix: "-metal-darwin-arm64-vibevoice-cpp"
    build-type: "metal"
    lang: "go"
-  # Vision/utility C++/ggml backends (go+cgo). Their Makefiles already carry a
-  # Darwin/Metal path (GGML_METAL=ON when build-type=metal); this just builds and
-  # publishes the metal image so Apple Silicon can install them.
-  - backend: "depth-anything-cpp"
-    tag-suffix: "-metal-darwin-arm64-depth-anything-cpp"
-    build-type: "metal"
-    lang: "go"
-  - backend: "locate-anything-cpp"
-    tag-suffix: "-metal-darwin-arm64-locate-anything-cpp"
-    build-type: "metal"
-    lang: "go"
-  - backend: "rfdetr-cpp"
-    tag-suffix: "-metal-darwin-arm64-rfdetr-cpp"
-    build-type: "metal"
-    lang: "go"
-  - backend: "sam3-cpp"
-    tag-suffix: "-metal-darwin-arm64-sam3-cpp"
-    build-type: "metal"
-    lang: "go"
-  # privacy-filter (PII/NER) is a C++/ggml backend built by a bespoke darwin
-  # script (make backends/privacy-filter-darwin); ggml defaults Metal ON on Apple
-  # so the build is Metal-enabled. lang=go drives runner/toolchain selection only.
-  - backend: "privacy-filter"
-    tag-suffix: "-metal-darwin-arm64-privacy-filter"
-    lang: "go"
-  # LocalVQE has no Metal path; on Apple Silicon it builds CPU-only (GGML_METAL
-  # OFF) but is still a native arm64 image. Uses the darwin/metal build profile.
-  - backend: "localvqe"
-    tag-suffix: "-metal-darwin-arm64-localvqe"
-    build-type: "metal"
-    lang: "go"
  - backend: "voxtral"
    tag-suffix: "-metal-darwin-arm64-voxtral"
    build-type: "metal"
@@ -5027,19 +4974,6 @@ includeDarwin:
  - backend: "kitten-tts"
    tag-suffix: "-metal-darwin-arm64-kitten-tts"
    build-type: "mps"
-  # vLLM on Apple Silicon via vllm-metal (MLX). The install is custom
-  # (backend/python/vllm/install.sh has a darwin branch); lang stays python so
-  # backend_build_darwin.yml drives it through build-darwin-python-backend ->
-  # scripts/build/python-darwin.sh, which runs the backend's install.sh.
-  - backend: "vllm"
-    tag-suffix: "-metal-darwin-arm64-vllm"
-    build-type: "mps"
-  - backend: "trl"
-    tag-suffix: "-metal-darwin-arm64-trl"
-    build-type: "mps"
-  - backend: "liquid-audio"
-    tag-suffix: "-metal-darwin-arm64-liquid-audio"
-    build-type: "mps"
  - backend: "piper"
    tag-suffix: "-metal-darwin-arm64-piper"
    build-type: "metal"
@@ -5056,10 +4990,6 @@ includeDarwin:
    tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
    build-type: "metal"
    lang: "go"
-  - backend: "supertonic"
-    tag-suffix: "-metal-darwin-arm64-supertonic"
-    build-type: "metal"
-    lang: "go"
  - backend: "local-store"
    tag-suffix: "-metal-darwin-arm64-local-store"
    build-type: "metal"
--- a/.github/bump_vllm_metal.sh
+++ b/.github/bump_vllm_metal.sh
@@ -1,55 +0,0 @@
-#!/bin/bash
-# Bump the single vllm-metal pin (VLLM_METAL_VERSION) in the vLLM backend's
-# darwin (Apple Silicon) install path. The macOS/Metal build
-# (backend/python/vllm/install.sh, Darwin branch) installs vllm-metal, which is
-# version-locked to a specific vLLM source release. install.sh derives that vLLM
-# version at build time from vllm-metal's own installer (`vllm_v=`) at the pinned
-# tag, so there is only ONE value to bump here -- mirroring bump_vllm_wheel.sh,
-# which bumps the Linux cu130 wheel pin.
-#
-# This deliberately tracks vllm-project/vllm-metal, NOT vllm-project/vllm: the
-# darwin build can only use the exact vLLM version vllm-metal supports, so it may
-# lag the Linux pin (requirements-cublas13-after.txt) until vllm-metal catches up.
-set -xe
-REPO=$1   # vllm-project/vllm-metal
-FILE=$2   # backend/python/vllm/install.sh
-VAR=$3    # VLLM_METAL_VERSION (used for the workflow's output file names)
-
-if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then
-    echo "usage: $0 <repo> <install-file> <var-name>" >&2
-    exit 1
-fi
-
-# vllm-metal ships frequent dev releases, all flagged as non-prerelease, so
-# /releases/latest returns the newest one (with its cp312 wheel asset).
-LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \
-    "https://api.github.com/repos/$REPO/releases/latest" \
-    | python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])")
-
-# The coupled vLLM source version lives in vllm-metal's installer at that tag.
-NEW_VLLM_VERSION=$(curl -fsSL \
-    "https://raw.githubusercontent.com/$REPO/$LATEST_TAG/install.sh" \
-    | grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -1 | cut -d'"' -f2)
-
-if [ -z "$LATEST_TAG" ] || [ -z "$NEW_VLLM_VERSION" ]; then
-    echo "Could not resolve vllm-metal tag ($LATEST_TAG) or its vllm_v ($NEW_VLLM_VERSION)." >&2
-    exit 1
-fi
-
-set +e
-CURRENT_TAG=$(grep -oE 'VLLM_METAL_VERSION="[^"]*"' "$FILE" | head -1 | cut -d'"' -f2)
-set -e
-
-# Rewrite the single pin. install.sh derives VLLM_VERSION from this tag at build
-# time, so there is nothing else to touch. peter-evans/create-pull-request opens
-# no PR on a clean tree, so a no-op rewrite (already current) is safe.
-sed -i "$FILE" \
-    -e "s|VLLM_METAL_VERSION=\"[^\"]*\"|VLLM_METAL_VERSION=\"$LATEST_TAG\"|"
-
-if [ -z "$CURRENT_TAG" ]; then
-    echo "Could not find VLLM_METAL_VERSION=\"...\" in $FILE." >&2
-    exit 0
-fi
-
-echo "vllm-metal ${CURRENT_TAG} -> ${LATEST_TAG} (builds vLLM ${NEW_VLLM_VERSION}): https://github.com/$REPO/releases/tag/${LATEST_TAG}" >> "${VAR}_message.txt"
-echo "${LATEST_TAG}" >> "${VAR}_commit.txt"
--- a/.github/workflows/backend_build_darwin.yml
+++ b/.github/workflows/backend_build_darwin.yml
@@ -228,17 +228,8 @@ jobs:
        run: |
          make backends/ds4-darwin

-      # privacy-filter is a C++/ggml backend like ds4 - a single grpc-server with
-      # otool dylib bundling - so it gets its own bespoke darwin script rather than
-      # the generic build-darwin-go-backend path.
-      - name: Build privacy-filter backend (Darwin Metal)
-        if: inputs.backend == 'privacy-filter'
-        run: |
-          make protogen-go
-          make backends/privacy-filter-darwin
-
      - name: Build ${{ inputs.backend }}-darwin
-        if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4' && inputs.backend != 'privacy-filter'
+        if: inputs.backend != 'llama-cpp' && inputs.backend != 'ds4'
        run: |
          make protogen-go
          BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -154,39 +154,3 @@ jobs:
          branch: "update/VLLM_VERSION"
          body: ${{ steps.bump.outputs.message }}
          signoff: true
-
-  bump-vllm-metal:
-    # The darwin (Apple Silicon) vLLM build installs vllm-metal, which is locked
-    # to a specific vLLM source release. install.sh pins both VLLM_METAL_VERSION
-    # (the wheel release) and VLLM_VERSION (the vLLM it builds against); this job
-    # tracks vllm-project/vllm-metal and rewrites both atomically. Separate from
-    # bump-vllm-wheel because darwin follows vllm-metal, not vllm/vllm latest.
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v7
-      - name: Bump vllm-metal pin 🔧
-        id: bump
-        run: |
-          bash .github/bump_vllm_metal.sh vllm-project/vllm-metal backend/python/vllm/install.sh VLLM_METAL_VERSION
-          {
-            echo 'message<<EOF'
-            cat "VLLM_METAL_VERSION_message.txt"
-            echo EOF
-          } >> "$GITHUB_OUTPUT"
-          {
-            echo 'commit<<EOF'
-            cat "VLLM_METAL_VERSION_commit.txt"
-            echo EOF
-          } >> "$GITHUB_OUTPUT"
-          rm -rfv VLLM_METAL_VERSION_message.txt VLLM_METAL_VERSION_commit.txt
-      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v8
-        with:
-          token: ${{ secrets.UPDATE_BOT_TOKEN }}
-          push-to-fork: ci-forks/LocalAI
-          commit-message: ':arrow_up: Update vllm-project/vllm-metal (darwin)'
-          title: 'chore: :arrow_up: Update vllm-metal (darwin) to `${{ steps.bump.outputs.commit }}`'
-          branch: "update/VLLM_METAL_VERSION"
-          body: ${{ steps.bump.outputs.message }}
-          signoff: true
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -71,7 +71,7 @@ jobs:
          if-no-files-found: ignore
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
@@ -116,7 +116,7 @@ jobs:
          PATH="$PATH:$HOME/go/bin" BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/.github/workflows/tests-aio.yml
+++ b/.github/workflows/tests-aio.yml
@@ -79,7 +79,7 @@ jobs:
            PATH="$PATH:$HOME/go/bin" make backends/local-store backends/silero-vad backends/llama-cpp backends/whisper backends/piper backends/stablediffusion-ggml docker-build-e2e e2e-aio
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/.github/workflows/tests-e2e.yml
+++ b/.github/workflows/tests-e2e.yml
@@ -57,7 +57,7 @@ jobs:
          PATH="$PATH:$HOME/go/bin" make build-mock-backend test-e2e
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/.github/workflows/tests-pii-ner-e2e.yml
+++ b/.github/workflows/tests-pii-ner-e2e.yml
@@ -90,7 +90,7 @@ jobs:
        run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/.github/workflows/tests-ui-e2e.yml
+++ b/.github/workflows/tests-ui-e2e.yml
@@ -75,7 +75,7 @@ jobs:
          retention-days: 7
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.24
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -43,5 +43,4 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
 - **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
 - **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
 - **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **Backend OS coverage**: a new backend must target every OS it can build for, not just Linux. `.github/backend-matrix.yml` has two matrices — `include:` (Linux) and `includeDarwin:` (macOS / Apple Silicon). Most C/C++/GGML and many Python backends build on Darwin too — wire the `includeDarwin` entry + `backend/index.yaml` `metal:` entries, or say in the PR why an OS is unsupported. See the darwin checklist in [.agents/adding-backends.md](.agents/adding-backends.md).
 - **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/6
+++ b/6
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter backends/privacy-filter-darwin
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/crispasr backends/parakeet-cpp backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/rfdetr-cpp backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/omnivoice-cpp backends/vibevoice-cpp backends/localvqe backends/tinygrad backends/sherpa-onnx backends/ds4 backends/ds4-darwin backends/liquid-audio backends/supertonic backends/depth-anything-cpp backends/privacy-filter

 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -1129,10 +1129,6 @@ backends/ds4-darwin: build
 	bash ./scripts/build/ds4-darwin.sh
 	./local-ai backends install "ocifile://$(abspath ./backend-images/ds4.tar)"

-backends/privacy-filter-darwin: build
-	bash ./scripts/build/privacy-filter-darwin.sh
-	./local-ai backends install "ocifile://$(abspath ./backend-images/privacy-filter.tar)"
-
 build-darwin-python-backend: build
 	bash ./scripts/build/python-darwin.sh

--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=b84902d2ad27c34f989f23947200c4b91b1568fd
+IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/ik-llama-cpp/run.sh
+++ b/backend/cpp/ik-llama-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -13,28 +13,28 @@ grep -e "flags" /proc/cpuinfo | head -1
 # ik_llama.cpp requires AVX2 — default to avx2 binary
 BINARY=ik-llama-cpp-avx2

-if [ -e "$CURDIR"/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
+if [ -e $CURDIR/ik-llama-cpp-fallback ] && ! grep -q -e "\savx2\s" /proc/cpuinfo ; then
 	echo "CPU:    AVX2   NOT found, using fallback"
 	BINARY=ik-llama-cpp-fallback
 fi

 # Extend ld library path with the dir where this script is located/lib
 if [ "$(uname)" == "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-	#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+	#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
 else
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using binary: $BINARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
 fi

 echo "Using binary: $BINARY"
-exec "$CURDIR"/$BINARY "$@"
+exec $CURDIR/$BINARY "$@"

 # We should never reach this point, however just in case we do, run fallback
-exec "$CURDIR"/ik-llama-cpp-fallback "$@"
+exec $CURDIR/ik-llama-cpp-fallback "$@"
--- a/backend/cpp/llama-cpp/CMakeLists.txt
+++ b/backend/cpp/llama-cpp/CMakeLists.txt
@@ -50,13 +50,8 @@ add_custom_command(
        "${hw_proto}"
      DEPENDS "${hw_proto}")

-# hw_grpc_proto: force STATIC. Under the CPU_ALL_VARIANTS build BUILD_SHARED_LIBS=ON
-# (ggml/llama become shared), which would otherwise make this glue library a DSO. As a
-# DSO it references the hidden-visibility symbols in the static libprotobuf.a, which the
-# linker cannot satisfy ("hidden symbol ... in libprotobuf.a is referenced by DSO").
-# Keeping it STATIC links protobuf/gRPC directly into the grpc-server executable while
-# only ggml/llama stay shared. No effect on the static variants (already BUILD_SHARED_LIBS=OFF).
-add_library(hw_grpc_proto STATIC
+# hw_grpc_proto
+add_library(hw_grpc_proto
  ${hw_grpc_srcs}
  ${hw_grpc_hdrs}
  ${hw_proto_srcs}
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=9d5d882d8cd0f0a9283d87ed5e6fe3ee0d925fb1
+LLAMA_VERSION?=7c082bc417bbe53210a83df4ba5b49e18ce6193c
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
@@ -10,16 +10,8 @@ TARGET?=--target grpc-server
 JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)
 ARCH?=$(shell uname -m)

-# Shared libs default to OFF: we link static gRPC and the avx/avx2/avx512/fallback
-# variants are fully static. The CPU_ALL_VARIANTS build flips SHARED_LIBS=ON (ggml/llama
-# become shared so the dynamic CPU backends work; gRPC stays static via its imported
-# targets). SHARED_LIBS is a make variable, not an appended -D, so it survives the
-# recursive sub-make into the VARIANT build dir (which re-parses this Makefile) instead
-# of being re-clobbered by a second -DBUILD_SHARED_LIBS=OFF. EXTRA_CMAKE_ARGS is the hook
-# the CPU_ALL_VARIANTS target uses to inject -DGGML_BACKEND_DL/-DGGML_CPU_ALL_VARIANTS.
-SHARED_LIBS?=OFF
-EXTRA_CMAKE_ARGS?=
-CMAKE_ARGS+=-DBUILD_SHARED_LIBS=$(SHARED_LIBS) -DLLAMA_CURL=OFF $(EXTRA_CMAKE_ARGS)
+# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static
+CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF

 CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
 ifeq ($(NATIVE),false)
@@ -128,30 +120,6 @@ llama-cpp-fallback: llama.cpp
 	CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server
 	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback

-# Single-build CPU backend using ggml's CPU_ALL_VARIANTS. Produces ONE grpc-server
-# plus a set of dlopen-able libggml-cpu-*.so (sandybridge/haswell/skylakex/...) that
-# ggml's backend registry selects from at runtime by probing host CPU features.
-# Replaces the avx/avx2/avx512/fallback multi-binary build on x86.
-#
-# CPU_ALL_VARIANTS requires GGML_BACKEND_DL, which requires BUILD_SHARED_LIBS=ON, so we
-# pass SHARED_LIBS=ON and the DL flags as make variables (NOT pre-expanded into the
-# CMAKE_ARGS env string): command-line make variables propagate through every recursive
-# sub-make, so the deepest VARIANT-dir build computes BUILD_SHARED_LIBS=ON consistently.
-# Only ggml/llama go shared - gRPC is found via its static imported targets, so the
-# grpc-server binary keeps static gRPC and only dynamically links ggml.
-#
-# TARGET adds "ggml": the per-microarch backends are runtime-dlopened, not link deps of
-# grpc-server, so they only build because each is an add_dependencies() of the ggml target.
-llama-cpp-cpu-all: llama.cpp
-	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build
-	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build purge
-	$(info ${GREEN}I llama-cpp build info:cpu-all-variants${RESET})
-	$(MAKE) SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" VARIANT="llama-cpp-cpu-all-build" build-llama-cpp-grpc-server
-	cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/grpc-server llama-cpp-cpu-all
-	rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs
-	find $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \;
-	@echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/
-
 llama-cpp-grpc: llama.cpp
 	cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build
 	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build purge
--- a/backend/cpp/llama-cpp/grpc-server.cpp
+++ b/backend/cpp/llama-cpp/grpc-server.cpp
@@ -37,7 +37,6 @@
 #include "backend.pb.h"
 #include "backend.grpc.pb.h"
 #include "common.h"
-#include "arg.h"
 #include "chat-auto-parser.h"
 #include <getopt.h>
 #include <grpcpp/ext/proto_server_reflection_plugin.h>
@@ -593,10 +592,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
    params.checkpoint_min_step = 256;
 #endif

-    // Raw upstream llama-server flags collected from any option entry that
-    // starts with '-'. Applied once after the loop via common_params_parse.
-    std::vector<std::string> extra_argv;
-
     // decode options. Options are in form optname:optvale, or if booleans only optname.
    for (int i = 0; i < request->options_size(); i++) {
        std::string opt = request->options(i);
@@ -1085,31 +1080,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                } catch (...) {}
            }

-        // --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) ---
-        } else if (!strcmp(optname, "cpu_moe")) {
-            // Bool-style flag: keep all MoE expert weights on CPU.
-            const bool enable = (optval == NULL) ||
-                optval_str == "true" || optval_str == "1" || optval_str == "yes" ||
-                optval_str == "on" || optval_str == "enabled";
-            if (enable) {
-                params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override());
-            }
-        } else if (!strcmp(optname, "n_cpu_moe")) {
-            if (optval != NULL) {
-                try {
-                    int n = std::stoi(optval_str);
-                    if (n < 0) n = 0;
-                    // Keep override-name storage alive for the lifetime of the
-                    // params struct (mirrors upstream arg.cpp's function-local static).
-                    static std::list<std::string> buft_overrides_main;
-                    for (int i = 0; i < n; ++i) {
-                        buft_overrides_main.push_back(llm_ffn_exps_block_regex(i));
-                        params.tensor_buft_overrides.push_back(
-                            {buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()});
-                    }
-                } catch (...) {}
-            }
-
        // --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) ---
        } else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) {
            // Format: <tensor regex>=<buffer type>,<tensor regex>=<buffer type>,...
@@ -1141,30 +1111,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                else { cur.push_back(c); }
            }
            if (!cur.empty()) flush(cur);
-
-        // --- generic passthrough: any entry starting with '-' is a raw
-        //     upstream llama-server flag, forwarded verbatim to the parser. ---
-        } else if (optname[0] == '-') {
-            std::string flag = optname;
-            // These flags make upstream's parser exit() (printing usage /
-            // completion), which would kill the backend process. Skip them.
-            if (flag == "-h" || flag == "--help" || flag == "--usage" ||
-                flag == "--version" || flag == "--license" ||
-                flag == "--list-devices" || flag == "-cl" ||
-                flag == "--cache-list" ||
-                flag.rfind("--completion", 0) == 0) {
-                fprintf(stderr,
-                    "[llama-cpp] ignoring passthrough flag that would exit: %s\n",
-                    flag.c_str());
-            } else {
-                extra_argv.push_back(flag);
-                // Preserve the whole value after the first ':' so embedded
-                // colons (e.g. host:port) survive strtok's truncation of optval.
-                auto colon = opt.find(':');
-                if (colon != std::string::npos) {
-                    extra_argv.push_back(opt.substr(colon + 1));
-                }
-            }
        }
    }

@@ -1200,6 +1146,27 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
        }
    }

+    if (!params.kv_overrides.empty()) {
+        params.kv_overrides.emplace_back();
+        params.kv_overrides.back().key[0] = 0;
+    }
+
+    // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
+    // Real entries are pushed during option parsing; here we pad/terminate so the
+    // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
+    // and so llama_params_fit has the placeholder slots it requires.
+    {
+        const size_t ntbo = llama_max_tensor_buft_overrides();
+        while (params.tensor_buft_overrides.size() < ntbo) {
+            params.tensor_buft_overrides.push_back({nullptr, nullptr});
+        }
+    }
+    // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
+    // the main-model handling above.
+    if (!params.speculative.draft.tensor_buft_overrides.empty()) {
+        params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
+    }
+
    // TODO: Add yarn

    if (!request->tensorsplit().empty()) {
@@ -1292,69 +1259,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
            params.sampling.grammar_triggers.push_back(std::move(trigger));
        }
    }
-
-    // Apply any raw upstream flags last so an explicit passthrough flag wins
-    // over the LocalAI-resolved field it maps to (e.g. --ctx-size beats
-    // context_size). This is the same parser llama-server itself uses.
-    if (!extra_argv.empty()) {
-        // common_params_parser_init resets a few fields for the SERVER example
-        // (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated
-        // passthrough flag can't silently clobber LocalAI's resolved value.
-        const int saved_n_parallel = params.n_parallel;
-
-        std::vector<char *> argv;
-        std::string prog = "llama-server";
-        argv.push_back(prog.data());
-        for (auto & a : extra_argv) {
-            argv.push_back(a.data());
-        }
-
-        // ctx_arg.params is a reference, so this overlays the given flags onto
-        // `params` in place. Returns false on a recoverable parse error (and
-        // self-restores params); may exit() on a hard error, exactly as
-        // passing the same bad flag to llama-server would.
-        if (!common_params_parse((int)argv.size(), argv.data(), params,
-                                 LLAMA_EXAMPLE_SERVER)) {
-            fprintf(stderr,
-                "[llama-cpp] failed to parse passthrough options; ignoring them\n");
-        }
-
-        // Restore n_parallel unless a passthrough flag explicitly set it
-        // (parser_init's reset sentinel for SERVER is -1).
-        if (params.n_parallel == -1) {
-            params.n_parallel = saved_n_parallel;
-        }
-    }
-
-    // Terminate/pad the override vectors only after BOTH the named-option loop
-    // and the generic passthrough (common_params_parse above) have pushed their
-    // real entries, so back() is the null sentinel the model loader asserts on.
-    // Running these before the passthrough let a passthrough flag (--cpu-moe,
-    // --override-tensor, --override-kv, ...) append a real entry after the
-    // sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for
-    // kv_overrides. Double-termination is harmless (the while is a no-op if the
-    // passthrough parse already padded; an extra trailing null is ignored).
-
-    if (!params.kv_overrides.empty()) {
-        params.kv_overrides.emplace_back();
-        params.kv_overrides.back().key[0] = 0;
-    }
-
-    // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
-    // Real entries are pushed during option parsing; here we pad/terminate so the
-    // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
-    // and so llama_params_fit has the placeholder slots it requires.
-    {
-        const size_t ntbo = llama_max_tensor_buft_overrides();
-        while (params.tensor_buft_overrides.size() < ntbo) {
-            params.tensor_buft_overrides.push_back({nullptr, nullptr});
-        }
-    }
-    // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
-    // the main-model handling above.
-    if (!params.speculative.draft.tensor_buft_overrides.empty()) {
-        params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
-    }
 }


--- a/backend/cpp/llama-cpp/package.sh
+++ b/backend/cpp/llama-cpp/package.sh
@@ -14,22 +14,6 @@ mkdir -p $CURDIR/package/lib
 cp -avrf $CURDIR/llama-cpp-* $CURDIR/package/
 cp -rfv $CURDIR/run.sh $CURDIR/package/

-# Bundle the ggml shared backends produced by the CPU_ALL_VARIANTS build (libggml-base.so,
-# libggml.so, libllama.so and the per-microarch libggml-cpu-*.so), all into package/lib.
-#
-# Two distinct resolution mechanisms both land here:
-#   - NEEDED deps (libggml-base/libggml/libllama): resolved by the dynamic linker via the
-#     LD_LIBRARY_PATH=$CURDIR/lib that run.sh exports.
-#   - The per-microarch libggml-cpu-*.so are NOT linked; ggml *discovers* them at runtime by
-#     scanning the executable's own directory (readlink /proc/self/exe). run.sh launches via
-#     the bundled $CURDIR/lib/ld.so, so /proc/self/exe -> .../lib/ld.so and ggml scans lib/.
-#     That is why the variants must sit in lib/ (next to ld.so), not just on the link path.
-# No-op on builds (arm64/darwin) that don't produce the all-variants set.
-if [ -d "$CURDIR/ggml-shared-libs" ]; then
-    echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..."
-    cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/
-fi
-
 # Detect architecture and copy appropriate libraries
 if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
    # x86_64 architecture
--- a/backend/cpp/llama-cpp/run.sh
+++ b/backend/cpp/llama-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,41 +12,55 @@ grep -e "flags" /proc/cpuinfo | head -1

 BINARY=llama-cpp-fallback

-# CPU images (x86, arm64, darwin) ship a single llama-cpp-cpu-all built with ggml
-# CPU_ALL_VARIANTS: ggml's backend registry dlopens the best libggml-cpu-*.so for this
-# host, so no shell-side AVX probing. GPU images (cublas/sycl/vulkan/hipblas) ship only
-# llama-cpp-fallback (the accelerator does the compute), so fall back to it when absent.
-if [ -e "$CURDIR"/llama-cpp-cpu-all ]; then
-	BINARY=llama-cpp-cpu-all
+if grep -q -e "\savx\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX    found OK"
+	if [ -e $CURDIR/llama-cpp-avx ]; then
+		BINARY=llama-cpp-avx
+	fi
+fi
+
+if grep -q -e "\savx2\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX2   found OK"
+	if [ -e $CURDIR/llama-cpp-avx2 ]; then
+		BINARY=llama-cpp-avx2
+	fi
+fi
+
+# Check avx 512
+if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX512F found OK"
+	if [ -e $CURDIR/llama-cpp-avx512 ]; then
+		BINARY=llama-cpp-avx512
+	fi
 fi

 if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
-	if [ -e "$CURDIR"/llama-cpp-grpc ]; then
+	if [ -e $CURDIR/llama-cpp-grpc ]; then
 		BINARY=llama-cpp-grpc
 	fi
 fi
 
 # Extend ld library path with the dir where this script is located/lib
 if [ "$(uname)" == "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-	#export DYLD_FALLBACK_LIBRARY_PATH="$CURDIR"/lib:$DYLD_FALLBACK_LIBRARY_PATH
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+	#export DYLD_FALLBACK_LIBRARY_PATH=$CURDIR/lib:$DYLD_FALLBACK_LIBRARY_PATH
 else
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 	# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
 	if [ -d "$CURDIR/lib/rocblas/library" ]; then
-		export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
+		export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
 	fi
 fi

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using binary: $BINARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
 fi

 echo "Using binary: $BINARY"
-exec "$CURDIR"/$BINARY "$@"
+exec $CURDIR/$BINARY "$@"

 # We should never reach this point, however just in case we do, run fallback
-exec "$CURDIR"/llama-cpp-fallback "$@"
+exec $CURDIR/llama-cpp-fallback "$@"
--- a/backend/cpp/privacy-filter/CMakeLists.txt
+++ b/backend/cpp/privacy-filter/CMakeLists.txt
@@ -51,14 +51,6 @@ add_library(hw_grpc_proto STATIC
    ${HW_GRPC_SRCS} ${HW_GRPC_HDRS}
    ${HW_PROTO_SRCS} ${HW_PROTO_HDRS})
 target_include_directories(hw_grpc_proto PUBLIC ${CMAKE_CURRENT_BINARY_DIR})
-# The generated proto/grpc sources include protobuf and grpc++ headers, so this
-# library must see their include dirs. Linking the imported targets propagates
-# them. On Linux the apt headers live in /usr/include (default search path) so
-# this was a no-op; on macOS the Homebrew headers are under /opt/homebrew and
-# would otherwise be missed (runtime_version.h not found).
-target_link_libraries(hw_grpc_proto PUBLIC
-    protobuf::libprotobuf
-    gRPC::grpc++)

 # Build only the pf static lib (+ ggml) from the engine tree — no CLI/bench/tests.
 # PF_VULKAN is honored when passed on the cmake command line (it lands in the
--- a/backend/cpp/privacy-filter/run.sh
+++ b/backend/cpp/privacy-filter/run.sh
@@ -2,13 +2,7 @@
 # Entry point for the privacy-filter backend image / BACKEND_BINARY mode.
 set -e
 CURDIR=$(dirname "$(realpath "$0")")
-# macOS has no bundled ld.so; the darwin package ships only dylibs under lib/,
-# resolved via DYLD_LIBRARY_PATH (the ld.so branch below is skipped there).
-if [ "$(uname)" = "Darwin" ]; then
-    export DYLD_LIBRARY_PATH="$CURDIR/lib:$DYLD_LIBRARY_PATH"
-else
-    export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
-fi
+export LD_LIBRARY_PATH="$CURDIR/lib:$LD_LIBRARY_PATH"
 if [ -f "$CURDIR/lib/ld.so" ]; then
    exec "$CURDIR/lib/ld.so" "$CURDIR/grpc-server" "$@"
 fi
--- a/backend/cpp/turboquant/Makefile
+++ b/backend/cpp/turboquant/Makefile
@@ -65,29 +65,6 @@ turboquant-avx:
 turboquant-fallback:
 	$(call turboquant-build,fallback,-DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)

-# Single-build CPU backend via ggml CPU_ALL_VARIANTS (mirrors llama-cpp-cpu-all).
-# turboquant reuses backend/cpp/llama-cpp's CMakeLists.txt (hw_grpc_proto STATIC) and
-# Makefile (SHARED_LIBS make-var + EXTRA_CMAKE_ARGS), so this passes the same overrides
-# through to the copied build: SHARED_LIBS=ON, the DL flags, and --target ggml (which
-# pulls in the per-microarch libggml-cpu-*.so via ggml's add_dependencies). The .so set
-# is collected for package.sh to bundle into package/lib.
-turboquant-cpu-all:
-	rm -rf $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build
-	cp -rf $(LLAMA_CPP_DIR) $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build
-	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build purge
-	bash $(CURRENT_MAKEFILE_DIR)/patch-grpc-server.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server.cpp
-	$(info $(GREEN)I turboquant build info:cpu-all-variants$(RESET))
-	LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \
-	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build llama.cpp
-	bash $(CURRENT_MAKEFILE_DIR)/apply-patches.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp $(PATCHES_DIR)
-	SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" \
-	LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \
-	$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build grpc-server
-	cp -rfv $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server turboquant-cpu-all
-	rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs
-	find $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \;
-	@echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/
-
 turboquant-grpc:
 	$(call turboquant-build,grpc,-DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server --target rpc-server)

--- a/backend/cpp/turboquant/package.sh
+++ b/backend/cpp/turboquant/package.sh
@@ -14,15 +14,6 @@ mkdir -p $CURDIR/package/lib
 cp -avrf $CURDIR/turboquant-* $CURDIR/package/
 cp -rfv $CURDIR/run.sh $CURDIR/package/

-# Bundle the ggml shared backends from the CPU_ALL_VARIANTS build into package/lib. ggml
-# discovers the per-microarch libggml-cpu-*.so by scanning the executable directory, which
-# (via the bundled lib/ld.so that run.sh launches through) resolves to lib/. See the
-# matching comment in backend/cpp/llama-cpp/package.sh. No-op on the fallback/ROCm builds.
-if [ -d "$CURDIR/ggml-shared-libs" ]; then
-    echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..."
-    cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/
-fi
-
 # Detect architecture and copy appropriate libraries
 if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
    # x86_64 architecture
--- a/backend/cpp/turboquant/run.sh
+++ b/backend/cpp/turboquant/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,39 +12,54 @@ grep -e "flags" /proc/cpuinfo | head -1

 BINARY=turboquant-fallback

-# x86/arm64 ship a single turboquant-cpu-all built with ggml CPU_ALL_VARIANTS: ggml's
-# backend registry dlopens the best libggml-cpu-*.so for this host, so no shell-side
-# probing. ROCm ships only turboquant-fallback, so fall back to it when cpu-all is absent.
-if [ -e "$CURDIR"/turboquant-cpu-all ]; then
-	BINARY=turboquant-cpu-all
+if grep -q -e "\savx\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX    found OK"
+	if [ -e $CURDIR/turboquant-avx ]; then
+		BINARY=turboquant-avx
+	fi
+fi
+
+if grep -q -e "\savx2\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX2   found OK"
+	if [ -e $CURDIR/turboquant-avx2 ]; then
+		BINARY=turboquant-avx2
+	fi
+fi
+
+# Check avx 512
+if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
+	echo "CPU:    AVX512F found OK"
+	if [ -e $CURDIR/turboquant-avx512 ]; then
+		BINARY=turboquant-avx512
+	fi
 fi

 if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
-	if [ -e "$CURDIR"/turboquant-grpc ]; then
+	if [ -e $CURDIR/turboquant-grpc ]; then
 		BINARY=turboquant-grpc
 	fi
 fi

 # Extend ld library path with the dir where this script is located/lib
 if [ "$(uname)" == "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
 else
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 	# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
 	if [ -d "$CURDIR/lib/rocblas/library" ]; then
-		export ROCBLAS_TENSILE_LIBPATH="$CURDIR"/lib/rocblas/library
+		export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
 	fi
 fi

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using binary: $BINARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/$BINARY "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
 fi

 echo "Using binary: $BINARY"
-exec "$CURDIR"/$BINARY "$@"
+exec $CURDIR/$BINARY "$@"

 # We should never reach this point, however just in case we do, run fallback
-exec "$CURDIR"/turboquant-fallback "$@"
+exec $CURDIR/turboquant-fallback "$@"
--- a/backend/go/acestep-cpp/Makefile
+++ b/backend/go/acestep-cpp/Makefile
@@ -117,8 +117,7 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET)

 test: acestep-cpp
 	@echo "Running acestep-cpp tests..."
--- a/backend/go/acestep-cpp/main.go
+++ b/backend/go/acestep-cpp/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -23,11 +22,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("ACESTEP_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgoacestepcpp-fallback.dylib"
-		} else {
-			libName = "./libgoacestepcpp-fallback.so"
-		}
+		libName = "./libgoacestepcpp-fallback.so"
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/acestep-cpp/package.sh
+++ b/backend/go/acestep-cpp/package.sh
@@ -13,7 +13,6 @@ mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/acestep-cpp $CURDIR/package/
 cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/
-cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/acestep-cpp/run.sh
+++ b/backend/go/acestep-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,29 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single library variant (Metal or Accelerate). The goacestepcpp
-	# target is built as a CMake MODULE, which emits a .dylib for a SHARED
-	# build but a .so for a MODULE build on Apple, so prefer .dylib and fall
-	# back to .so.
-	LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib"
-	if [ ! -e "$LIBRARY" ]; then
-		LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
-	fi
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
+LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgoacestepcpp-avx.so ]; then
+		if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
 			LIBRARY="$CURDIR/libgoacestepcpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgoacestepcpp-avx2.so ]; then
+		if [ -e $CURDIR/libgoacestepcpp-avx2.so ]; then
 			LIBRARY="$CURDIR/libgoacestepcpp-avx2.so"
 		fi
 	fi
@@ -42,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgoacestepcpp-avx512.so ]; then
+		if [ -e $CURDIR/libgoacestepcpp-avx512.so ]; then
 			LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export ACESTEP_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/acestep-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/acestep-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/acestep-cpp "$@"
+exec $CURDIR/acestep-cpp "$@"
--- a/backend/go/ced/Makefile
+++ b/backend/go/ced/Makefile
@@ -57,7 +57,6 @@ libced.so: sources/ced.cpp
 	cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
 	cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
 	cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
-	cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true
 	cp -fv sources/ced.cpp/include/ced_capi.h ./

 ced-grpc: libced.so main.go goced.go
--- a/backend/go/ced/main.go
+++ b/backend/go/ced/main.go
@@ -12,7 +12,6 @@ import (
 	"flag"
 	"fmt"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ type libFunc struct {
 func main() {
 	libName := os.Getenv("CED_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "libced.dylib"
-		} else {
-			libName = "libced.so"
-		}
+		libName = "libced.so"
 	}
 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
 	if err != nil {
--- a/backend/go/ced/package.sh
+++ b/backend/go/ced/package.sh
@@ -15,12 +15,10 @@ mkdir -p "$CURDIR/package/lib"
 cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
 cp -avf "$CURDIR/run.sh" "$CURDIR/package/"

-cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true
-cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true
-if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then
-	echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2
+cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libced.so not found in $CURDIR, run 'make' first" >&2
 	exit 1
-fi
+}

 if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
    echo "Detected x86_64 architecture, copying x86_64 libraries..."
--- a/backend/go/ced/run.sh
+++ b/backend/go/ced/run.sh
@@ -3,12 +3,7 @@ set -e

 CURDIR=$(dirname "$(realpath "$0")")

-if [ "$(uname)" = "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
-	export CED_LIBRARY="$CURDIR/lib/libced.dylib"
-else
-	export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
-fi
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"

 # If a self-contained ld.so was packaged, route through it so the packaged
 # libc / libstdc++ are used instead of the host's (matches the sibling backends).
--- a/backend/go/cloud-proxy/run.sh
+++ b/backend/go/cloud-proxy/run.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-exec "$CURDIR"/cloud-proxy "$@"
+exec $CURDIR/cloud-proxy "$@"
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=8f1218141b792b8868861c1af17ba1e361b05dc0
+CRISPASR_VERSION?=7a8cb80907341c0204bd0488c1244764f4163883
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -75,8 +75,7 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
-	VARIANT_TARGETS = libgocrispasr-fallback.dylib
+	VARIANT_TARGETS = libgocrispasr-fallback.so
 endif

 crispasr: main.go gocrispasr.go $(VARIANT_TARGETS)
@@ -88,7 +87,7 @@ package: crispasr
 build: package

 clean: purge
-	rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr
+	rm -rf libgocrispasr*.so package sources/CrispASR crispasr

 purge:
 	rm -rf build*
@@ -119,21 +118,13 @@ libgocrispasr-fallback.so: sources/CrispASR
 	SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
 	rm -rfv build*

-# Build fallback variant as a dylib (Darwin)
-libgocrispasr-fallback.dylib: sources/CrispASR
-	$(MAKE) purge
-	$(info ${GREEN}I crispasr build info:fallback (dylib)${RESET})
-	SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
-	rm -rfv build*
-
 libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET)

 test: crispasr
 	CGO_ENABLED=0 $(GOCMD) test -v ./...
--- a/backend/go/crispasr/main.go
+++ b/backend/go/crispasr/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("CRISPASR_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgocrispasr-fallback.dylib"
-		} else {
-			libName = "./libgocrispasr-fallback.so"
-		}
+		libName = "./libgocrispasr-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/crispasr/package.sh
+++ b/backend/go/crispasr/package.sh
@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/crispasr $CURDIR/package/
-cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/crispasr/run.sh
+++ b/backend/go/crispasr/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgocrispasr-fallback.so"
+LIBRARY="$CURDIR/libgocrispasr-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgocrispasr-avx.so ]; then
+		if [ -e $CURDIR/libgocrispasr-avx.so ]; then
 			LIBRARY="$CURDIR/libgocrispasr-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgocrispasr-avx2.so ]; then
+		if [ -e $CURDIR/libgocrispasr-avx2.so ]; then
 			LIBRARY="$CURDIR/libgocrispasr-avx2.so"
 		fi
 	fi
@@ -36,27 +32,26 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgocrispasr-avx512.so ]; then
+		if [ -e $CURDIR/libgocrispasr-avx512.so ]; then
 			LIBRARY="$CURDIR/libgocrispasr-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export CRISPASR_LIBRARY=$LIBRARY

 # Point piper's espeak-ng phonemizer at the bundled voice data. The variable
 # names the directory CONTAINING espeak-ng-data (package.sh drops it next to
 # this script). Harmless when espeak-ng wasn't bundled.
-export CRISPASR_ESPEAK_DATA_PATH="$CURDIR"
+export CRISPASR_ESPEAK_DATA_PATH=$CURDIR

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/crispasr "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/crispasr "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/crispasr "$@"
+exec $CURDIR/crispasr "$@"
--- a/backend/go/depth-anything-cpp/Makefile
+++ b/backend/go/depth-anything-cpp/Makefile
@@ -40,8 +40,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
 else ifeq ($(BUILD_TYPE),vulkan)
 	CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON
 else ifeq ($(OS),Darwin)
-	# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
-	# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
 	ifneq ($(BUILD_TYPE),metal)
 		CMAKE_ARGS+=-DGGML_METAL=OFF
 	else
@@ -79,7 +77,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib
+	VARIANT_TARGETS = libdepthanythingcpp-fallback.so
 endif

 depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
@@ -91,7 +89,7 @@ package: depth-anything-cpp
 build: package

 clean: purge
-	rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources
+	rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources

 purge:
 	rm -rf build*
@@ -118,19 +116,11 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
 endif

 # Build fallback variant (all platforms)
-ifeq ($(UNAME_S),Darwin)
-libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp
-	rm -rfv build-$@
-	$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
-	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
-	rm -rfv build-$@
-else
 libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
 	rm -rfv build-$@
-endif

 libdepthanythingcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -138,8 +128,7 @@ libdepthanythingcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)

 all: depth-anything-cpp package

--- a/backend/go/depth-anything-cpp/main.go
+++ b/backend/go/depth-anything-cpp/main.go
@@ -9,7 +9,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("DEPTHANYTHING_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libdepthanythingcpp-fallback.dylib"
-		} else {
-			libName = "./libdepthanythingcpp-fallback.so"
-		}
+		libName = "./libdepthanythingcpp-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/depth-anything-cpp/package.sh
+++ b/backend/go/depth-anything-cpp/package.sh
@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
 cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/depth-anything-cpp/run.sh
+++ b/backend/go/depth-anything-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
+LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libdepthanythingcpp-avx.so ]; then
+		if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
 			LIBRARY="$CURDIR/libdepthanythingcpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libdepthanythingcpp-avx2.so ]; then
+		if [ -e $CURDIR/libdepthanythingcpp-avx2.so ]; then
 			LIBRARY="$CURDIR/libdepthanythingcpp-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libdepthanythingcpp-avx512.so ]; then
+		if [ -e $CURDIR/libdepthanythingcpp-avx512.so ]; then
 			LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export DEPTHANYTHING_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/depth-anything-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/depth-anything-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/depth-anything-cpp "$@"
+exec $CURDIR/depth-anything-cpp "$@"
--- a/backend/go/local-store/run.sh
+++ b/backend/go/local-store/run.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-exec "$CURDIR"/local-store "$@"
+exec $CURDIR/local-store "$@"
--- a/backend/go/localvqe/Makefile
+++ b/backend/go/localvqe/Makefile
@@ -32,8 +32,6 @@ endif
 ifeq ($(BUILD_TYPE),vulkan)
 	CMAKE_ARGS+=-DGGML_VULKAN=ON -DLOCALVQE_VULKAN=ON
 else ifeq ($(OS),Darwin)
-	# Apple Silicon: CPU-only (no Metal upstream); built + published as an arm64
-	# image by CI (includeDarwin in .github/backend-matrix.yml) for macOS install.
 	CMAKE_ARGS+=-DGGML_METAL=OFF
 endif

@@ -69,9 +67,8 @@ $(LIB_SENTINEL): sources/LocalVQE
 	# that the loader picks at runtime. We must build every target — the
 	# default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY
 	# routes all of them into build/bin; copy them out next to the binary.
-	cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib .
+	cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* .
 	cp -P build/bin/libggml*.so* . 2>/dev/null || true
-	cp -P build/bin/libggml*.dylib . 2>/dev/null || true
 	touch $(LIB_SENTINEL)

 liblocalvqe.so: $(LIB_SENTINEL)
--- a/backend/go/localvqe/main.go
+++ b/backend/go/localvqe/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("LOCALVQE_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./liblocalvqe.dylib"
-		} else {
-			libName = "./liblocalvqe.so"
-		}
+		libName = "./liblocalvqe.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/localvqe/package.sh
+++ b/backend/go/localvqe/package.sh
@@ -15,9 +15,7 @@ cp -avf $CURDIR/localvqe $CURDIR/package/
 # liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime
 # variants — LocalVQE picks the matching CPU variant at load time.
 cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true
-cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true
 cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true
-cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/localvqe/run.sh
+++ b/backend/go/localvqe/run.sh
@@ -1,34 +1,23 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 # LocalVQE's runtime CPU-variant loader (ggml_backend_load_all) searches
 # get_executable_path() and current_path() — the second one is what saves us
 # when /proc/self/exe resolves to lib/ld.so under the bundled-loader path.
-# So we cd into "$CURDIR" (where all the libggml-cpu-*.so files live) before
+# So we cd into $CURDIR (where all the libggml-cpu-*.so files live) before
 # exec'ing the binary.
 cd "$CURDIR"

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
-	# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
-	export DYLD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$DYLD_LIBRARY_PATH
-	LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.dylib
-	if [ ! -e "$LOCALVQE_LIBRARY" ]; then
-		LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
-	fi
-	export LOCALVQE_LIBRARY
-else
-	export LD_LIBRARY_PATH="$CURDIR":"$CURDIR"/lib:$LD_LIBRARY_PATH
-	export LOCALVQE_LIBRARY="$CURDIR"/liblocalvqe.so
-fi
+export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
+export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so

-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LOCALVQE_LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/localvqe "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/localvqe "$@"
 fi

 echo "Using library: $LOCALVQE_LIBRARY"
-exec "$CURDIR"/localvqe "$@"
+exec $CURDIR/localvqe "$@"
--- a/backend/go/locate-anything-cpp/Makefile
+++ b/backend/go/locate-anything-cpp/Makefile
@@ -33,8 +33,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
 else ifeq ($(BUILD_TYPE),vulkan)
 	CMAKE_ARGS+=-DGGML_VULKAN=ON -DLA_GGML_VULKAN=ON
 else ifeq ($(OS),Darwin)
-	# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
-	# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
 	ifneq ($(BUILD_TYPE),metal)
 		CMAKE_ARGS+=-DGGML_METAL=OFF
 	else
@@ -72,7 +70,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib
+	VARIANT_TARGETS = liblocateanythingcpp-fallback.so
 endif

 locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
@@ -84,7 +82,7 @@ package: locate-anything-cpp
 build: package

 clean: purge
-	rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources
+	rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources

 purge:
 	rm -rf build*
@@ -111,19 +109,11 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
 endif

 # Build fallback variant (all platforms)
-ifeq ($(UNAME_S),Darwin)
-liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp
-	rm -rfv build-$@
-	$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
-	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
-	rm -rfv build-$@
-else
 liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
 	rm -rfv build-$@
-endif

 liblocateanythingcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -131,8 +121,7 @@ liblocateanythingcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)

 all: locate-anything-cpp package

--- a/backend/go/locate-anything-cpp/main.go
+++ b/backend/go/locate-anything-cpp/main.go
@@ -9,7 +9,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("LOCATEANYTHING_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./liblocateanythingcpp-fallback.dylib"
-		} else {
-			libName = "./liblocateanythingcpp-fallback.so"
-		}
+		libName = "./liblocateanythingcpp-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/locate-anything-cpp/package.sh
+++ b/backend/go/locate-anything-cpp/package.sh
@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
 cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/locate-anything-cpp/run.sh
+++ b/backend/go/locate-anything-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
+LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/liblocateanythingcpp-avx.so ]; then
+		if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
 			LIBRARY="$CURDIR/liblocateanythingcpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/liblocateanythingcpp-avx2.so ]; then
+		if [ -e $CURDIR/liblocateanythingcpp-avx2.so ]; then
 			LIBRARY="$CURDIR/liblocateanythingcpp-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/liblocateanythingcpp-avx512.so ]; then
+		if [ -e $CURDIR/liblocateanythingcpp-avx512.so ]; then
 			LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export LOCATEANYTHING_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/locate-anything-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/locate-anything-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/locate-anything-cpp "$@"
+exec $CURDIR/locate-anything-cpp "$@"
--- a/backend/go/omnivoice-cpp/Makefile
+++ b/backend/go/omnivoice-cpp/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # omnivoice.cpp version
 OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
-OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607
+OMNIVOICE_VERSION?=96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd
 SO_TARGET?=libgomnivoicecpp.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,7 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
-	VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib
+	VARIANT_TARGETS = libgomnivoicecpp-fallback.so
 endif

 omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
@@ -78,7 +77,7 @@ package: omnivoice-cpp
 build: package

 clean: purge
-	rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp
+	rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp

 purge:
 	rm -rf build*
@@ -107,20 +106,13 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
 	SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
 	rm -rf build-libgomnivoicecpp-fallback.so

-# Build fallback variant as a dylib (Darwin)
-libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp
-	$(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET})
-	SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
-	rm -rf build-libgomnivoicecpp-fallback.dylib
-
 libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)

 test: omnivoice-cpp
 	@echo "Running omnivoice-cpp tests..."
--- a/backend/go/omnivoice-cpp/main.go
+++ b/backend/go/omnivoice-cpp/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("OMNIVOICE_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgomnivoicecpp-fallback.dylib"
-		} else {
-			libName = "./libgomnivoicecpp-fallback.so"
-		}
+		libName = "./libgomnivoicecpp-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/omnivoice-cpp/package.sh
+++ b/backend/go/omnivoice-cpp/package.sh
@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
-cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/omnivoice-cpp/run.sh
+++ b/backend/go/omnivoice-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
+LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgomnivoicecpp-avx.so ]; then
+		if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
 			LIBRARY="$CURDIR/libgomnivoicecpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgomnivoicecpp-avx2.so ]; then
+		if [ -e $CURDIR/libgomnivoicecpp-avx2.so ]; then
 			LIBRARY="$CURDIR/libgomnivoicecpp-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgomnivoicecpp-avx512.so ]; then
+		if [ -e $CURDIR/libgomnivoicecpp-avx512.so ]; then
 			LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export OMNIVOICE_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/omnivoice-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/omnivoice-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/omnivoice-cpp "$@"
+exec $CURDIR/omnivoice-cpp "$@"
--- a/backend/go/opus/run.sh
+++ b/backend/go/opus/run.sh
@@ -1,15 +1,15 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
-export OPUS_SHIM_LIBRARY="$CURDIR"/lib/libopusshim.so
+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+export OPUS_SHIM_LIBRARY=$CURDIR/lib/libopusshim.so

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/opus "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/opus "$@"
 fi

-exec "$CURDIR"/opus "$@"
+exec $CURDIR/opus "$@"
--- a/backend/go/parakeet-cpp/Makefile
+++ b/backend/go/parakeet-cpp/Makefile
@@ -1,6 +1,6 @@
 # parakeet-cpp backend Makefile.
 #
-# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
+# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
 # (.github/bump_deps.sh) can find and update it - matches the
 # whisper.cpp / ds4 / vibevoice-cpp convention.
 #
@@ -15,7 +15,7 @@
 # That's what the L0 smoke test uses. The default target below does the
 # proper clone-at-pin + cmake build so CI doesn't need a side-checkout.

-PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
+PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
 PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp

 GOCMD?=go
@@ -74,7 +74,6 @@ libparakeet.so: sources/parakeet.cpp
 	cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS)
 	cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS)
 	cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true
-	cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true
 	cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./

 parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go
--- a/backend/go/parakeet-cpp/main.go
+++ b/backend/go/parakeet-cpp/main.go
@@ -2,17 +2,15 @@ package main

 // Started internally by LocalAI - one gRPC server per loaded model.
 //
-// Loads the parakeet shared library via purego and registers the flat
-// C-API entry points declared in parakeet_capi.h. The library name can be
-// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY /
-// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default
-// looks next to this binary for libparakeet.so on Linux and
-// libparakeet.dylib on macOS.
+// Loads libparakeet.so via purego and registers the flat C-API entry
+// points declared in parakeet_capi.h. The library name can be overridden
+// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY
+// convention in the sibling backends); the default looks for the .so next
+// to this binary.
 import (
 	"flag"
 	"fmt"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -30,11 +28,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("PARAKEET_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "libparakeet.dylib"
-		} else {
-			libName = "libparakeet.so"
-		}
+		libName = "libparakeet.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/parakeet-cpp/package.sh
+++ b/backend/go/parakeet-cpp/package.sh
@@ -16,15 +16,12 @@ mkdir -p "$CURDIR/package/lib"
 cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
 cp -avf "$CURDIR/run.sh" "$CURDIR/package/"

-# libparakeet shared lib + any soname symlinks. On Linux this is
-# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen
-# resolves it via the *_LIBRARY_PATH that run.sh points at lib/.
-cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true
-cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true
-if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then
-	echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2
+# libparakeet.so + any soname symlinks (libparakeet.so.X[.Y]). purego.Dlopen
+# resolves it via LD_LIBRARY_PATH, which run.sh points at lib/.
+cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
+	echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
 	exit 1
-fi
+}

 # Detect architecture and copy the core runtime libs libparakeet.so links
 # against, plus the matching dynamic loader as lib/ld.so.
@@ -51,7 +48,7 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
 elif [ "$(uname -s)" = "Darwin" ]; then
-    echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
+    echo "Detected Darwin"
 else
    echo "Error: Could not detect architecture"
    exit 1
--- a/backend/go/parakeet-cpp/run.sh
+++ b/backend/go/parakeet-cpp/run.sh
@@ -3,17 +3,11 @@ set -e

 CURDIR=$(dirname "$(realpath "$0")")

-if [ "$(uname)" = "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${DYLD_LIBRARY_PATH:-}"
-	export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
-else
-	export LD_LIBRARY_PATH="$CURDIR/lib:"$CURDIR":${LD_LIBRARY_PATH:-}"
-	export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
-fi
+export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"

 # If a self-contained ld.so was packaged, route through it so the
 # packaged libc / libstdc++ are used instead of the host's (matches the
-# whisper backend's runtime layout). Linux only.
+# whisper backend's runtime layout).
 if [ -f "$CURDIR/lib/ld.so" ]; then
 	echo "Using lib/ld.so"
 	exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@"
--- a/backend/go/piper/run.sh
+++ b/backend/go/piper/run.sh
@@ -1,15 +1,15 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-export ESPEAK_NG_DATA="$CURDIR"/espeak-ng-data
-export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+export ESPEAK_NG_DATA=$CURDIR/espeak-ng-data
+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/piper "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/piper "$@"
 fi

-exec "$CURDIR"/piper "$@"
+exec $CURDIR/piper "$@"
--- a/backend/go/qwen3-tts-cpp/Makefile
+++ b/backend/go/qwen3-tts-cpp/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # qwentts.cpp version
 QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
-QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42
+QWEN3TTS_CPP_VERSION?=4536dcdce27c3764a93a06d6bf64026b124962f5
 SO_TARGET?=libgoqwen3ttscpp.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
-	VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib
+	# On non-Linux (e.g., Darwin), build only fallback variant
+	VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so
 endif

 qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
 build: package

 clean: purge
-	rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp
+	rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp

 purge:
 	rm -rf build*
@@ -110,20 +110,13 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
 	SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
 	rm -rf build-libgoqwen3ttscpp-fallback.so

-# Build fallback variant as a dylib (Darwin)
-libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp
-	$(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET})
-	SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
-	rm -rf build-libgoqwen3ttscpp-fallback.dylib
-
 libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET)

 test: qwen3-tts-cpp
 	@echo "Running qwen3-tts-cpp tests..."
--- a/backend/go/qwen3-tts-cpp/main.go
+++ b/backend/go/qwen3-tts-cpp/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("QWEN3TTS_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgoqwen3ttscpp-fallback.dylib"
-		} else {
-			libName = "./libgoqwen3ttscpp-fallback.so"
-		}
+		libName = "./libgoqwen3ttscpp-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/qwen3-tts-cpp/package.sh
+++ b/backend/go/qwen3-tts-cpp/package.sh
@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/
-cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/qwen3-tts-cpp/run.sh
+++ b/backend/go/qwen3-tts-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
+LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgoqwen3ttscpp-avx.so ]; then
+		if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then
 			LIBRARY="$CURDIR/libgoqwen3ttscpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgoqwen3ttscpp-avx2.so ]; then
+		if [ -e $CURDIR/libgoqwen3ttscpp-avx2.so ]; then
 			LIBRARY="$CURDIR/libgoqwen3ttscpp-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgoqwen3ttscpp-avx512.so ]; then
+		if [ -e $CURDIR/libgoqwen3ttscpp-avx512.so ]; then
 			LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export QWEN3TTS_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/qwen3-tts-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/qwen3-tts-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/qwen3-tts-cpp "$@"
+exec $CURDIR/qwen3-tts-cpp "$@"
--- a/backend/go/rfdetr-cpp/Makefile
+++ b/backend/go/rfdetr-cpp/Makefile
@@ -34,8 +34,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
 else ifeq ($(BUILD_TYPE),vulkan)
 	CMAKE_ARGS+=-DGGML_VULKAN=ON -DRFDETR_GGML_VULKAN=ON
 else ifeq ($(OS),Darwin)
-	# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
-	# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
 	ifneq ($(BUILD_TYPE),metal)
 		CMAKE_ARGS+=-DGGML_METAL=OFF
 	else
@@ -73,7 +71,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = librfdetrcpp-avx.so librfdetrcpp-avx2.so librfdetrcpp-avx512.so librfdetrcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = librfdetrcpp-fallback.dylib
+	VARIANT_TARGETS = librfdetrcpp-fallback.so
 endif

 rfdetr-cpp: main.go gorfdetrcpp.go $(VARIANT_TARGETS)
@@ -85,7 +83,7 @@ package: rfdetr-cpp
 build: package

 clean: purge
-	rm -rf librfdetrcpp*.so librfdetrcpp*.dylib rfdetr-cpp package sources
+	rm -rf librfdetrcpp*.so rfdetr-cpp package sources

 purge:
 	rm -rf build*
@@ -112,19 +110,11 @@ librfdetrcpp-avx512.so: sources/rt-detr.cpp
 endif

 # Build fallback variant (all platforms)
-ifeq ($(UNAME_S),Darwin)
-librfdetrcpp-fallback.dylib: sources/rt-detr.cpp
-	rm -rfv build-$@
-	$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
-	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
-	rm -rfv build-$@
-else
 librfdetrcpp-fallback.so: sources/rt-detr.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
 	rm -rfv build-$@
-endif

 librfdetrcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -132,8 +122,7 @@ librfdetrcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/librfdetrcpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET)

 all: rfdetr-cpp package

--- a/backend/go/rfdetr-cpp/main.go
+++ b/backend/go/rfdetr-cpp/main.go
@@ -9,7 +9,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,11 +27,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("RFDETR_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./librfdetrcpp-fallback.dylib"
-		} else {
-			libName = "./librfdetrcpp-fallback.so"
-		}
+		libName = "./librfdetrcpp-fallback.so"
 	}

 	rfdetrLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/rfdetr-cpp/package.sh
+++ b/backend/go/rfdetr-cpp/package.sh
@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -fv $CURDIR/librfdetrcpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/librfdetrcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -avf $CURDIR/librfdetrcpp-*.so $CURDIR/package/
 cp -avf $CURDIR/rfdetr-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/rfdetr-cpp/run.sh
+++ b/backend/go/rfdetr-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
+LIBRARY="$CURDIR/librfdetrcpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/librfdetrcpp-avx.so ]; then
+		if [ -e $CURDIR/librfdetrcpp-avx.so ]; then
 			LIBRARY="$CURDIR/librfdetrcpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/librfdetrcpp-avx2.so ]; then
+		if [ -e $CURDIR/librfdetrcpp-avx2.so ]; then
 			LIBRARY="$CURDIR/librfdetrcpp-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/librfdetrcpp-avx512.so ]; then
+		if [ -e $CURDIR/librfdetrcpp-avx512.so ]; then
 			LIBRARY="$CURDIR/librfdetrcpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export RFDETR_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/rfdetr-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/rfdetr-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/rfdetr-cpp "$@"
+exec $CURDIR/rfdetr-cpp "$@"
--- a/backend/go/sam3-cpp/Makefile
+++ b/backend/go/sam3-cpp/Makefile
@@ -31,8 +31,6 @@ else ifeq ($(BUILD_TYPE),hipblas)
 else ifeq ($(BUILD_TYPE),vulkan)
 	CMAKE_ARGS+=-DGGML_VULKAN=ON
 else ifeq ($(OS),Darwin)
-	# macOS/Metal: built + published as an OCI image by CI (includeDarwin in
-	# .github/backend-matrix.yml) so Apple Silicon users can install this backend.
 	ifneq ($(BUILD_TYPE),metal)
 		CMAKE_ARGS+=-DGGML_METAL=OFF
 	else
@@ -68,7 +66,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgosam3-avx.so libgosam3-avx2.so libgosam3-avx512.so libgosam3-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libgosam3-fallback.dylib
+	VARIANT_TARGETS = libgosam3-fallback.so
 endif

 sam3-cpp: main.go gosam3.go $(VARIANT_TARGETS)
@@ -80,7 +78,7 @@ package: sam3-cpp
 build: package

 clean: purge
-	rm -rf libgosam3*.so libgosam3*.dylib sam3-cpp package sources
+	rm -rf libgosam3*.so sam3-cpp package sources

 purge:
 	rm -rf build*
@@ -107,19 +105,11 @@ libgosam3-avx512.so: sources/sam3.cpp
 endif

 # Build fallback variant (all platforms)
-ifeq ($(UNAME_S),Darwin)
-libgosam3-fallback.dylib: sources/sam3.cpp
-	$(MAKE) purge
-	$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
-	SO_TARGET=libgosam3-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
-	rm -rfv build*
-else
 libgosam3-fallback.so: sources/sam3.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
 	SO_TARGET=libgosam3-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
 	rm -rfv build*
-endif

 libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
 	mkdir -p build-$(SO_TARGET) && \
@@ -127,7 +117,6 @@ libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgosam3.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET)

 all: sam3-cpp package
--- a/backend/go/sam3-cpp/main.go
+++ b/backend/go/sam3-cpp/main.go
@@ -3,7 +3,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("SAM3_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgosam3-fallback.dylib"
-		} else {
-			libName = "./libgosam3-fallback.so"
-		}
+		libName = "./libgosam3-fallback.so"
 	}

 	gosamLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/sam3-cpp/package.sh
+++ b/backend/go/sam3-cpp/package.sh
@@ -10,8 +10,7 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -fv $CURDIR/libgosam3-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgosam3-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -avf $CURDIR/libgosam3-*.so $CURDIR/package/
 cp -avf $CURDIR/sam3-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/sam3-cpp/run.sh
+++ b/backend/go/sam3-cpp/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgosam3-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgosam3-fallback.so"
+LIBRARY="$CURDIR/libgosam3-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgosam3-avx.so ]; then
+		if [ -e $CURDIR/libgosam3-avx.so ]; then
 			LIBRARY="$CURDIR/libgosam3-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgosam3-avx2.so ]; then
+		if [ -e $CURDIR/libgosam3-avx2.so ]; then
 			LIBRARY="$CURDIR/libgosam3-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgosam3-avx512.so ]; then
+		if [ -e $CURDIR/libgosam3-avx512.so ]; then
 			LIBRARY="$CURDIR/libgosam3-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export SAM3_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/sam3-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/sam3-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/sam3-cpp "$@"
+exec $CURDIR/sam3-cpp "$@"
--- a/backend/go/sherpa-onnx/backend.go
+++ b/backend/go/sherpa-onnx/backend.go
@@ -7,7 +7,6 @@ import (
 	"fmt"
 	"os"
 	"path/filepath"
-	"runtime"
 	"strconv"
 	"strings"
 	"sync"
@@ -239,19 +238,11 @@ func loadSherpaLibs() error {
 func loadSherpaLibsOnce() error {
 	shimLib := os.Getenv("SHERPA_SHIM_LIBRARY")
 	if shimLib == "" {
-		if runtime.GOOS == "darwin" {
-			shimLib = "libsherpa-shim.dylib"
-		} else {
-			shimLib = "libsherpa-shim.so"
-		}
+		shimLib = "libsherpa-shim.so"
 	}
 	capiLib := os.Getenv("SHERPA_ONNX_LIBRARY")
 	if capiLib == "" {
-		if runtime.GOOS == "darwin" {
-			capiLib = "libsherpa-onnx-c-api.dylib"
-		} else {
-			capiLib = "libsherpa-onnx-c-api.so"
-		}
+		capiLib = "libsherpa-onnx-c-api.so"
 	}

 	shim, err := purego.Dlopen(shimLib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/sherpa-onnx/run.sh
+++ b/backend/go/sherpa-onnx/run.sh
@@ -1,19 +1,13 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-if [ "$(uname)" = "Darwin" ]; then
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-	export SHERPA_SHIM_LIBRARY="$CURDIR"/lib/libsherpa-shim.dylib
-	export SHERPA_ONNX_LIBRARY="$CURDIR"/lib/libsherpa-onnx-c-api.dylib
-else
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
-fi
+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH

-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/sherpa-onnx "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/sherpa-onnx "$@"
 fi

-exec "$CURDIR"/sherpa-onnx "$@"
+exec $CURDIR/sherpa-onnx "$@"
--- a/backend/go/silero-vad/run.sh
+++ b/backend/go/silero-vad/run.sh
@@ -1,14 +1,14 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/silero-vad "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/silero-vad "$@"
 fi

-exec "$CURDIR"/silero-vad "$@"
+exec $CURDIR/silero-vad "$@"
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f
+STABLEDIFFUSION_GGML_VERSION?=b12098f5d09fc83da36e65c784f7bdb16a5a5ebf

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

@@ -131,7 +131,6 @@ libgosd-custom: CMakeLists.txt cpp/gosd.cpp cpp/gosd.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgosd.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)

 all: stablediffusion-ggml package
--- a/backend/go/stablediffusion-ggml/main.go
+++ b/backend/go/stablediffusion-ggml/main.go
@@ -3,7 +3,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("SD_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgosd-fallback.dylib"
-		} else {
-			libName = "./libgosd-fallback.so"
-		}
+		libName = "./libgosd-fallback.so"
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/stablediffusion-ggml/package.sh
+++ b/backend/go/stablediffusion-ggml/package.sh
@@ -12,7 +12,6 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
-cp -fv $CURDIR/libgosd-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/stablediffusion-ggml/run.sh
+++ b/backend/go/stablediffusion-ggml/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,28 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single library variant (Metal or Accelerate). The gosd target is
-	# built as a CMake MODULE, which emits a .dylib for a SHARED build but a
-	# .so for a MODULE build on Apple, so prefer .dylib and fall back to .so.
-	LIBRARY="$CURDIR/libgosd-fallback.dylib"
-	if [ ! -e "$LIBRARY" ]; then
-		LIBRARY="$CURDIR/libgosd-fallback.so"
-	fi
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgosd-fallback.so"
+LIBRARY="$CURDIR/libgosd-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgosd-avx.so ]; then
+		if [ -e $CURDIR/libgosd-avx.so ]; then
 			LIBRARY="$CURDIR/libgosd-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgosd-avx2.so ]; then
+		if [ -e $CURDIR/libgosd-avx2.so ]; then
 			LIBRARY="$CURDIR/libgosd-avx2.so"
 		fi
 	fi
@@ -41,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgosd-avx512.so ]; then
+		if [ -e $CURDIR/libgosd-avx512.so ]; then
 			LIBRARY="$CURDIR/libgosd-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export SD_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/stablediffusion-ggml "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/stablediffusion-ggml "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/stablediffusion-ggml "$@"
+exec $CURDIR/stablediffusion-ggml "$@"
--- a/backend/go/supertonic/helper.go
+++ b/backend/go/supertonic/helper.go
@@ -16,7 +16,6 @@ import (
 	"os"
 	"path/filepath"
 	"regexp"
-	"runtime"
 	"strings"
 	"time"
 	"unicode"
@@ -944,13 +943,7 @@ func InitializeONNXRuntime() error {
 			}
 		}
 		if libPath == "" {
-			// LocalAI: default to the platform-native shared library
-			// extension when nothing else is found (dyld vs ld.so).
-			if runtime.GOOS == "darwin" {
-				libPath = "/usr/local/lib/libonnxruntime.dylib"
-			} else {
-				libPath = "/usr/local/lib/libonnxruntime.so"
-			}
+			libPath = "/usr/local/lib/libonnxruntime.so"
 		}
 	}
 	ort.SetSharedLibraryPath(libPath)
--- a/backend/go/supertonic/package.sh
+++ b/backend/go/supertonic/package.sh
@@ -32,10 +32,6 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
-elif [ $(uname -s) = "Darwin" ]; then
-    # macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in
-    # run.sh); there is no ld.so loader nor glibc to bundle.
-    echo "Detected Darwin"
 else
    echo "Error: Could not detect architecture"
    exit 1
--- a/backend/go/supertonic/run.sh
+++ b/backend/go/supertonic/run.sh
@@ -1,21 +1,14 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS uses dyld: there is no ld.so loader, and the search path env
-	# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-	export ONNXRUNTIME_LIB_PATH="$CURDIR"/lib/libonnxruntime.dylib
-else
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
-	export ONNXRUNTIME_LIB_PATH="$CURDIR"/lib/libonnxruntime.so
+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so

-	if [ -f "$CURDIR"/lib/ld.so ]; then
-		echo "Using lib/ld.so"
-		exec "$CURDIR"/lib/ld.so "$CURDIR"/supertonic "$@"
-	fi
+if [ -f $CURDIR/lib/ld.so ]; then
+	echo "Using lib/ld.so"
+	exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
 fi

-exec "$CURDIR"/supertonic "$@"
+exec $CURDIR/supertonic "$@"
--- a/backend/go/vibevoice-cpp/Makefile
+++ b/backend/go/vibevoice-cpp/Makefile
@@ -70,8 +70,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
-	VARIANT_TARGETS = libgovibevoicecpp-fallback.dylib
+	# On non-Linux (e.g., Darwin), build only fallback variant
+	VARIANT_TARGETS = libgovibevoicecpp-fallback.so
 endif

 vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: vibevoice-cpp
 build: package

 clean: purge
-	rm -rf libgovibevoicecpp*.so libgovibevoicecpp*.dylib package sources/vibevoice.cpp vibevoice-cpp
+	rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp

 purge:
 	rm -rf build*
@@ -119,21 +119,13 @@ libgovibevoicecpp-fallback.so: sources/vibevoice.cpp
 	SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
 	rm -rfv build*

-# Build fallback variant as a dylib (Darwin)
-libgovibevoicecpp-fallback.dylib: sources/vibevoice.cpp
-	$(MAKE) purge
-	$(info ${GREEN}I vibevoice-cpp build info:fallback (dylib)${RESET})
-	SO_TARGET=libgovibevoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
-	rm -rfv build*
-
 libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \
 	cd .. && \
-	(mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
-	 mv build-$(SO_TARGET)/libgovibevoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)
+	mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET)

 test: vibevoice-cpp
 	@echo "Running vibevoice-cpp tests..."
--- a/backend/go/vibevoice-cpp/main.go
+++ b/backend/go/vibevoice-cpp/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,11 +21,7 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("VIBEVOICECPP_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgovibevoicecpp-fallback.dylib"
-		} else {
-			libName = "./libgovibevoicecpp-fallback.so"
-		}
+		libName = "./libgovibevoicecpp-fallback.so"
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/vibevoice-cpp/package.sh
+++ b/backend/go/vibevoice-cpp/package.sh
@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/vibevoice-cpp $CURDIR/package/
-cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgovibevoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/vibevoice-cpp/run.sh
+++ b/backend/go/vibevoice-cpp/run.sh
@@ -1,7 +1,7 @@
 #!/bin/bash
 set -ex

-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -11,44 +11,39 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgovibevoicecpp-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"
+LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgovibevoicecpp-avx.so ]; then
+		if [ -e $CURDIR/libgovibevoicecpp-avx.so ]; then
 			LIBRARY="$CURDIR/libgovibevoicecpp-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgovibevoicecpp-avx2.so ]; then
+		if [ -e $CURDIR/libgovibevoicecpp-avx2.so ]; then
 			LIBRARY="$CURDIR/libgovibevoicecpp-avx2.so"
 		fi
 	fi

 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgovibevoicecpp-avx512.so ]; then
+		if [ -e $CURDIR/libgovibevoicecpp-avx512.so ]; then
 			LIBRARY="$CURDIR/libgovibevoicecpp-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export VIBEVOICECPP_LIBRARY=$LIBRARY

-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/vibevoice-cpp "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/vibevoice-cpp "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/vibevoice-cpp "$@"
+exec $CURDIR/vibevoice-cpp "$@"
--- a/backend/go/voxtral/run.sh
+++ b/backend/go/voxtral/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -15,35 +15,35 @@ fi
 if [ "$(uname)" = "Darwin" ]; then
 	# macOS: single dylib variant (Metal or Accelerate)
 	LIBRARY="$CURDIR/libgovoxtral-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
 else
 	LIBRARY="$CURDIR/libgovoxtral-fallback.so"

 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgovoxtral-avx.so ]; then
+		if [ -e $CURDIR/libgovoxtral-avx.so ]; then
 			LIBRARY="$CURDIR/libgovoxtral-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgovoxtral-avx2.so ]; then
+		if [ -e $CURDIR/libgovoxtral-avx2.so ]; then
 			LIBRARY="$CURDIR/libgovoxtral-avx2.so"
 		fi
 	fi

-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

 export VOXTRAL_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it (Linux only)
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/voxtral "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/voxtral "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/voxtral "$@"
+exec $CURDIR/voxtral "$@"
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea
+WHISPER_CPP_VERSION?=5ed76e9a079962f1c85cfce44edd325c27ef1f97
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -117,7 +117,6 @@ libgowhisper-custom: CMakeLists.txt cpp/gowhisper.cpp cpp/gowhisper.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET) 2>/dev/null || \
-		mv build-$(SO_TARGET)/libgowhisper.dylib ./$(SO_TARGET:.so=.dylib)
+	mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET)

 all: whisper package
--- a/backend/go/whisper/main.go
+++ b/backend/go/whisper/main.go
@@ -4,7 +4,6 @@ package main
 import (
 	"flag"
 	"os"
-	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -23,11 +22,7 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("WHISPER_LIBRARY")
 	if libName == "" {
-		if runtime.GOOS == "darwin" {
-			libName = "./libgowhisper-fallback.dylib"
-		} else {
-			libName = "./libgowhisper-fallback.so"
-		}
+		libName = "./libgowhisper-fallback.so"
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/whisper/package.sh
+++ b/backend/go/whisper/package.sh
@@ -12,8 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/whisper $CURDIR/package/
-cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/ 2>/dev/null || true
-cp -fv $CURDIR/libgowhisper-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/whisper/run.sh
+++ b/backend/go/whisper/run.sh
@@ -2,7 +2,7 @@
 set -ex

 # Get the absolute current dir where the script is located
-CURDIR=$(dirname "$(realpath "$0")")
+CURDIR=$(dirname "$(realpath $0)")

 cd /

@@ -12,23 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-if [ "$(uname)" = "Darwin" ]; then
-	# macOS: single dylib variant (Metal or Accelerate)
-	LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
-	export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
-else
-	LIBRARY="$CURDIR/libgowhisper-fallback.so"
+LIBRARY="$CURDIR/libgowhisper-fallback.so"

+if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
-		if [ -e "$CURDIR"/libgowhisper-avx.so ]; then
+		if [ -e $CURDIR/libgowhisper-avx.so ]; then
 			LIBRARY="$CURDIR/libgowhisper-avx.so"
 		fi
 	fi

 	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX2   found OK"
-		if [ -e "$CURDIR"/libgowhisper-avx2.so ]; then
+		if [ -e $CURDIR/libgowhisper-avx2.so ]; then
 			LIBRARY="$CURDIR/libgowhisper-avx2.so"
 		fi
 	fi
@@ -36,22 +32,21 @@ else
 	# Check avx 512
 	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX512F found OK"
-		if [ -e "$CURDIR"/libgowhisper-avx512.so ]; then
+		if [ -e $CURDIR/libgowhisper-avx512.so ]; then
 			LIBRARY="$CURDIR/libgowhisper-avx512.so"
 		fi
 	fi
-
-	export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
 fi

+export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export WHISPER_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
-if [ -f "$CURDIR"/lib/ld.so ]; then
+if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
 	echo "Using library: $LIBRARY"
-	exec "$CURDIR"/lib/ld.so "$CURDIR"/whisper "$@"
+	exec $CURDIR/lib/ld.so $CURDIR/whisper "$@"
 fi

 echo "Using library: $LIBRARY"
-exec "$CURDIR"/whisper "$@"
+exec $CURDIR/whisper "$@"
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -340,7 +340,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp"
    intel: "intel-sycl-f32-sam3-cpp"
    vulkan: "vulkan-sam3-cpp"
-    metal: "metal-sam3-cpp"
 - &rfdetrcpp
  name: "rfdetr-cpp"
  alias: "rfdetr-cpp"
@@ -369,7 +368,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp"
    intel: "intel-sycl-f32-rfdetr-cpp"
    vulkan: "vulkan-rfdetr-cpp"
-    metal: "metal-rfdetr-cpp"
 - &locateanything
  name: "locate-anything"
  alias: "locate-anything"
@@ -399,7 +397,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp"
    intel: "intel-sycl-f32-locate-anything-cpp"
    vulkan: "vulkan-locate-anything-cpp"
-    metal: "metal-locate-anything-cpp"
 - !!merge <<: *locateanything
  name: "locate-anything-development"
  capabilities:
@@ -412,7 +409,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp-development"
    intel: "intel-sycl-f32-locate-anything-cpp-development"
    vulkan: "vulkan-locate-anything-cpp-development"
-    metal: "metal-locate-anything-cpp-development"
 - !!merge <<: *locateanything
  name: "cpu-locate-anything-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-locate-anything-cpp"
@@ -423,16 +419,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-locate-anything-cpp"
  mirrors:
    - localai/localai-backends:master-cpu-locate-anything-cpp
- !!merge <<: *locateanything
-  name: "metal-locate-anything-cpp"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-locate-anything-cpp"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-locate-anything-cpp
- !!merge <<: *locateanything
-  name: "metal-locate-anything-cpp-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-locate-anything-cpp"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-locate-anything-cpp
 - !!merge <<: *locateanything
  name: "cuda12-locate-anything-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-locate-anything-cpp"
@@ -531,7 +517,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp"
    intel: "intel-sycl-f32-depth-anything-cpp"
    vulkan: "vulkan-depth-anything-cpp"
-    metal: "metal-depth-anything-cpp"
 - !!merge <<: *depthanything
  name: "depth-anything-development"
  capabilities:
@@ -544,7 +529,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp-development"
    intel: "intel-sycl-f32-depth-anything-cpp-development"
    vulkan: "vulkan-depth-anything-cpp-development"
-    metal: "metal-depth-anything-cpp-development"
 - !!merge <<: *depthanything
  name: "cpu-depth-anything-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-depth-anything-cpp"
@@ -555,16 +539,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-depth-anything-cpp"
  mirrors:
    - localai/localai-backends:master-cpu-depth-anything-cpp
- !!merge <<: *depthanything
-  name: "metal-depth-anything-cpp"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-depth-anything-cpp"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-depth-anything-cpp
- !!merge <<: *depthanything
-  name: "metal-depth-anything-cpp-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-depth-anything-cpp"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-depth-anything-cpp
 - !!merge <<: *depthanything
  name: "cuda12-depth-anything-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-depth-anything-cpp"
@@ -671,7 +645,6 @@
    nvidia-cuda-13: "cuda13-vllm"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
    cpu: "cpu-vllm"
-    metal: "metal-vllm"
 - &sglang
  name: "sglang"
  license: apache-2.0
@@ -1057,8 +1030,6 @@
    nvidia-l4t: "vulkan-localvqe"
    nvidia-l4t-cuda-12: "vulkan-localvqe"
    nvidia-l4t-cuda-13: "vulkan-localvqe"
-    # Apple Silicon: CPU build (LocalVQE has no Metal path); still arm64-native.
-    metal: "metal-localvqe"
 - &privacyfilter
  name: "privacy-filter"
  alias: "privacy-filter"
@@ -1095,7 +1066,6 @@
    amd: "vulkan-privacy-filter"
    intel: "vulkan-privacy-filter"
    vulkan: "vulkan-privacy-filter"
-    metal: "metal-privacy-filter"
 - &faster-whisper
  icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
  description: |
@@ -1314,7 +1284,6 @@
    nvidia-cuda-13: "cuda13-liquid-audio"
    nvidia-cuda-12: "cuda12-liquid-audio"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio"
-    metal: "metal-liquid-audio"
  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png
 - &qwen-tts
  urls:
@@ -1600,7 +1569,6 @@
    - TTS
  capabilities:
    default: "cpu-supertonic"
-    metal: "metal-supertonic"
 - !!merge <<: *neutts
  name: "neutts-development"
  capabilities:
@@ -2938,16 +2906,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-privacy-filter"
  mirrors:
    - localai/localai-backends:master-gpu-vulkan-privacy-filter
- !!merge <<: *privacyfilter
-  name: "metal-privacy-filter"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-privacy-filter"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-privacy-filter
- !!merge <<: *privacyfilter
-  name: "metal-privacy-filter-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-privacy-filter"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-privacy-filter
 - !!merge <<: *privacyfilter
  name: "cuda13-privacy-filter"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-privacy-filter"
@@ -2969,17 +2927,6 @@
    nvidia-cuda-13: "cuda13-vllm-development"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
    cpu: "cpu-vllm-development"
-    metal: "metal-vllm-development"
- !!merge <<: *vllm
-  name: "metal-vllm"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-vllm"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-vllm
- !!merge <<: *vllm
-  name: "metal-vllm-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-vllm"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-vllm
 - !!merge <<: *vllm
  name: "cuda12-vllm"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
@@ -3259,7 +3206,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp-development"
    intel: "intel-sycl-f32-sam3-cpp-development"
    vulkan: "vulkan-sam3-cpp-development"
-    metal: "metal-sam3-cpp-development"
 - !!merge <<: *sam3cpp
  name: "cpu-sam3-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-sam3-cpp"
@@ -3270,16 +3216,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-sam3-cpp"
  mirrors:
    - localai/localai-backends:master-cpu-sam3-cpp
- !!merge <<: *sam3cpp
-  name: "metal-sam3-cpp"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-sam3-cpp"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-sam3-cpp
- !!merge <<: *sam3cpp
-  name: "metal-sam3-cpp-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-sam3-cpp"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-sam3-cpp
 - !!merge <<: *sam3cpp
  name: "cuda12-sam3-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sam3-cpp"
@@ -3353,7 +3289,6 @@
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp-development"
    intel: "intel-sycl-f32-rfdetr-cpp-development"
    vulkan: "vulkan-rfdetr-cpp-development"
-    metal: "metal-rfdetr-cpp-development"
 - !!merge <<: *rfdetrcpp
  name: "cpu-rfdetr-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-rfdetr-cpp"
@@ -3364,16 +3299,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-rfdetr-cpp"
  mirrors:
    - localai/localai-backends:master-cpu-rfdetr-cpp
- !!merge <<: *rfdetrcpp
-  name: "metal-rfdetr-cpp"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-rfdetr-cpp"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-rfdetr-cpp
- !!merge <<: *rfdetrcpp
-  name: "metal-rfdetr-cpp-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-rfdetr-cpp"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-rfdetr-cpp
 - !!merge <<: *rfdetrcpp
  name: "cuda12-rfdetr-cpp"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rfdetr-cpp"
@@ -4162,16 +4087,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-localvqe"
  mirrors:
    - localai/localai-backends:master-gpu-vulkan-localvqe
- !!merge <<: *localvqecpp
-  name: "metal-localvqe"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-localvqe"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-localvqe
- !!merge <<: *localvqecpp
-  name: "metal-localvqe-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-localvqe"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-localvqe
 ## kokoro
 - !!merge <<: *kokoro
  name: "kokoro-development"
@@ -4697,7 +4612,6 @@
    nvidia-cuda-13: "cuda13-liquid-audio-development"
    nvidia-cuda-12: "cuda12-liquid-audio-development"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio-development"
-    metal: "metal-liquid-audio-development"
 - !!merge <<: *liquid-audio
  name: "cpu-liquid-audio"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-liquid-audio"
@@ -4708,16 +4622,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-liquid-audio"
  mirrors:
    - localai/localai-backends:master-cpu-liquid-audio
- !!merge <<: *liquid-audio
-  name: "metal-liquid-audio"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-liquid-audio"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-liquid-audio
- !!merge <<: *liquid-audio
-  name: "metal-liquid-audio-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-liquid-audio"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-liquid-audio
 - !!merge <<: *liquid-audio
  name: "cuda12-liquid-audio"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-liquid-audio"
@@ -5378,7 +5282,6 @@
    nvidia: "cuda12-trl"
    nvidia-cuda-12: "cuda12-trl"
    nvidia-cuda-13: "cuda13-trl"
-    metal: "metal-trl"
 ## TRL backend images
 - !!merge <<: *trl
  name: "cpu-trl"
@@ -5410,16 +5313,6 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-trl"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-13-trl
- !!merge <<: *trl
-  name: "metal-trl"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-trl"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-trl
- !!merge <<: *trl
-  name: "metal-trl-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-trl"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-trl
 ## llama.cpp quantization backend
 - &llama-cpp-quantization
  name: "llama-cpp-quantization"
@@ -5591,7 +5484,6 @@
  name: "supertonic-development"
  capabilities:
    default: "cpu-supertonic-development"
-    metal: "metal-supertonic-development"
 - !!merge <<: *supertonic
  name: "cpu-supertonic"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic"
@@ -5602,13 +5494,3 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic"
  mirrors:
    - localai/localai-backends:master-cpu-supertonic
- !!merge <<: *supertonic
-  name: "metal-supertonic"
-  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-supertonic"
-  mirrors:
-    - localai/localai-backends:latest-metal-darwin-arm64-supertonic
- !!merge <<: *supertonic
-  name: "metal-supertonic-development"
-  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-supertonic"
-  mirrors:
-    - localai/localai-backends:master-metal-darwin-arm64-supertonic
--- a/backend/python/liquid-audio/install.sh
+++ b/backend/python/liquid-audio/install.sh
@@ -14,11 +14,5 @@ else
 fi

 # liquid-audio's torch wheels are large; allow upgrades to satisfy transitive pins
-EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
-# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
-# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
-# it on the uv path; Linux/CUDA resolution is unchanged.
-if [ "x${USE_PIP:-}" != "xtrue" ]; then
-    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
-fi
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
 installRequirements
--- a/backend/python/liquid-audio/requirements-mps.txt
+++ b/backend/python/liquid-audio/requirements-mps.txt
@@ -1,4 +1,3 @@
-# MPS (Apple Silicon / Metal) build profile - installed by the darwin CI job.
 torch>=2.8.0
 torchaudio>=2.8.0
 torchcodec>=0.9.1
--- a/backend/python/trl/install.sh
+++ b/backend/python/trl/install.sh
@@ -8,13 +8,7 @@ else
    source $backend_dir/../common/libbackend.sh
 fi

-EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
-# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
-# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
-# it when uv is the installer, keeping the Linux/CUDA resolution unchanged.
-if [ "x${USE_PIP:-}" != "xtrue" ]; then
-    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
-fi
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
 installRequirements

 # Fetch convert_hf_to_gguf.py and gguf package from the same llama.cpp version
--- a/backend/python/trl/requirements-mps.txt
+++ b/backend/python/trl/requirements-mps.txt
@@ -1,12 +0,0 @@
-torch==2.10.0
-trl
-peft
-datasets>=3.0.0
-transformers>=4.56.2
-accelerate>=1.4.0
-huggingface-hub>=1.3.0
-sentencepiece
-# Note: bitsandbytes is intentionally omitted on MPS. It is only used by the
-# CUDA (cublas) variants for 8-bit/4-bit quantization and has poor support on
-# Apple Silicon. torch here uses the plain PyPI wheels, which ship MPS support
-# on macOS arm64.
--- a/backend/python/vllm/backend.py
+++ b/backend/python/vllm/backend.py
@@ -457,14 +457,9 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
                    except Exception:
                        pass

-                _pl = getattr(last_output, "prompt_logprobs", None) if last_output is not None else None
-                # Some engines accept the prompt_logprobs request but return a
-                # list of all-None entries instead of computing them (observed
-                # with vllm-metal's MLX backend on macOS). Treat that as
-                # unsupported rather than silently scoring every candidate as 0.
-                if not _pl or all(e is None for e in _pl):
-                    context.set_code(grpc.StatusCode.UNIMPLEMENTED)
-                    context.set_details("This backend did not return prompt_logprobs; scoring is unsupported on this engine (e.g. vllm-metal / MLX on macOS).")
+                if last_output is None or not getattr(last_output, "prompt_logprobs", None):
+                    context.set_code(grpc.StatusCode.INTERNAL)
+                    context.set_details("vLLM did not return prompt_logprobs")
                    return backend_pb2.ScoreResponse()

                prompt_logprobs = last_output.prompt_logprobs
--- a/backend/python/vllm/install.sh
+++ b/backend/python/vllm/install.sh
@@ -43,24 +43,6 @@ if [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
 fi

-# Apple Silicon (Metal/MLX) via vllm-metal.
-# vllm-metal (github.com/vllm-project/vllm-metal) brings vLLM to macOS on Apple
-# Silicon: it registers through vLLM's platform-plugin entry point
-# (metal -> vllm_metal:register), MetalPlatform activates, and the vLLM v1
-# AsyncLLM engine runs on the GPU through MLX. LocalAI's backend.py is UNCHANGED
-# on darwin — AsyncEngineArgs(...) -> AsyncLLMEngine.from_engine_args transparently
-# resolves to the MLX engine (proven on a real M4 / macOS 26.5 against Qwen3-0.6B).
-#
-# vllm-metal REQUIRES Python 3.12, so force the portable CPython before the venv
-# is created (ensureVenv reads PYTHON_VERSION/PYTHON_PATCH/PY_STANDALONE_TAG).
-# The patch + standalone tag mirror the l4t13 cp312 pin — a known-good
-# python-build-standalone release that also ships an aarch64-apple-darwin asset.
-if [ "$(uname -s)" = "Darwin" ]; then
-    PYTHON_VERSION="3.12"
-    PYTHON_PATCH="12"
-    PY_STANDALONE_TAG="20251120"
-fi
-
 # JetPack 7 / L4T arm64 vllm + torch wheels come straight from PyPI now
 # (torch 2.11+ ships aarch64 + cu130 manylinux wheels and vllm 0.20+ ships
 # an aarch64 wheel pinned to that torch). They're cp312-only, so bump the
@@ -75,87 +57,11 @@ if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
    PY_STANDALONE_TAG="20251120"
 fi

-# ===================== Apple Silicon (Metal/MLX) =====================
-# Reproduce vllm-metal's upstream installer
-# (curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh)
-# but INTO LocalAI's managed venv (ensureVenv) instead of a throwaway
-# ~/.venv-vllm-metal, so the backend integrates with LocalAI's venv lifecycle
-# (portable CPython, _makeVenvPortable relocation, runtime activation). The
-# normal CUDA/CPU installRequirements is skipped on darwin — there is no
-# macOS/arm64 vLLM wheel on PyPI; vLLM is built from source and the MLX engine
-# is layered on by the vllm-metal wheel.
-if [ "$(uname -s)" = "Darwin" ]; then
-    # Create/activate the portable 3.12 venv. On darwin USE_PIP=true and
-    # PORTABLE_PYTHON=true (set by scripts/build/python-darwin.sh), so this is a
-    # `python -m venv` based, relocatable venv.
-    ensureVenv
-
-    # vllm-metal's installer drives everything through `uv`: building vLLM from
-    # the CPU requirements needs `--index-strategy unsafe-best-match` (mixes the
-    # pytorch CPU channel with PyPI), a flag plain pip does not have. The darwin
-    # venv is pip-based, so bootstrap uv into it. uv honours $VIRTUAL_ENV (set by
-    # libbackend's _activateVenv) and installs into THIS venv — same pattern the
-    # intel branch below relies on.
-    pip install uv
-
-    # The ONLY darwin version pin -- AUTO-BUMPED by .github/bump_vllm_metal.sh,
-    # which tracks vllm-project/vllm-metal releases (NOT vllm/vllm latest). Keep
-    # it as a plain double-quoted assignment on its own line so the bumper's sed
-    # can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux
-    # vllm pin (requirements-cublas13-after.txt, bumped independently against
-    # vllm/vllm) until vllm-metal supports a newer vLLM.
-    VLLM_METAL_VERSION="v0.3.0.dev20260622062346"
-
-    # The coupled vLLM source version is whatever this vllm-metal release builds
-    # against -- it declares it in its own installer as `vllm_v=`. Derive it from
-    # the PINNED tag rather than hardcoding a second value that could drift. The
-    # tag is immutable, so this stays reproducible across rebuilds.
-    VLLM_VERSION=$(curl -fsSL "https://raw.githubusercontent.com/vllm-project/vllm-metal/${VLLM_METAL_VERSION}/install.sh" \
-        | grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -n1 | cut -d'"' -f2)
-    if [ -z "${VLLM_VERSION}" ]; then
-        echo "ERROR: could not derive the vLLM version from vllm-metal ${VLLM_METAL_VERSION}" >&2
-        exit 1
-    fi
-    echo "vllm-metal ${VLLM_METAL_VERSION} builds against vLLM ${VLLM_VERSION}"
-
-    _vllm_src=$(mktemp -d)
-    trap 'rm -rf "${_vllm_src}"' EXIT
-    pushd "${_vllm_src}"
-        # 1) Build vLLM ${VLLM_VERSION} from the release source tarball against
-        #    the CPU requirements. vllm-metal layers its MLX platform plugin on
-        #    top of this exact build.
-        curl -fsSL -o "vllm-${VLLM_VERSION}.tar.gz" \
-            "https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}.tar.gz"
-        tar -xzf "vllm-${VLLM_VERSION}.tar.gz"
-        pushd "vllm-${VLLM_VERSION}"
-            uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
-            # -Wno-parentheses: clang on macOS treats one of vLLM's C++ warnings
-            # as an error without it (matches the upstream installer's CXXFLAGS).
-            CXXFLAGS="-Wno-parentheses" uv pip install .
-        popd
-    popd
-
-    # 2) Install the prebuilt vllm-metal wheel for the PINNED release. It pulls
-    #    mlx / mlx-metal as deps and registers the `metal` platform plugin that
-    #    backend.py resolves to at engine-init time. Build the release-asset URL
-    #    deterministically (tag + the cp312/arm64 wheel name) rather than querying
-    #    api.github.com, whose unauthenticated rate limit (60/hr per IP) 403s on
-    #    shared CI runners. The wheel version is the tag without its leading 'v'.
-    _metal_wheel="vllm_metal-${VLLM_METAL_VERSION#v}-cp312-cp312-macosx_11_0_arm64.whl"
-    _metal_wheel_url="https://github.com/vllm-project/vllm-metal/releases/download/${VLLM_METAL_VERSION}/${_metal_wheel}"
-    echo "Installing vllm-metal wheel: ${_metal_wheel_url}"
-    uv pip install "${_metal_wheel_url}"
-
-    # Generate the gRPC stubs (backend_pb2*). installRequirements normally does
-    # this via runProtogen at the end; we skipped installRequirements on darwin,
-    # so call it explicitly here.
-    runProtogen
-
 # Intel XPU has no upstream-published vllm wheels, so we always build vllm
 # from source against torch-xpu and replace the default triton with
 # triton-xpu (matching torch 2.11). Mirrors the upstream procedure:
 # https://github.com/vllm-project/vllm/blob/main/docs/getting_started/installation/gpu.xpu.inc.md
-elif [ "x${BUILD_TYPE}" == "xintel" ]; then
+if [ "x${BUILD_TYPE}" == "xintel" ]; then
    # Hide requirements-intel-after.txt so installRequirements doesn't
    # try `pip install vllm` (would either fail or grab a non-XPU wheel).
    _intel_after="${backend_dir}/requirements-intel-after.txt"
--- a/Show More
+++ b/Show More