fix(config): gate parallel-slot default on per-device VRAM too (#10485 )

The first #10485 fix (#10494) made the Blackwell physical-batch boost per-device/context-aware, which neutralized the big compute-buffer OOM, but the reporter's 2x16 GiB consumer Blackwell still OOM'd. Tracing the post-fix log: the model now loads its weights, builds the main context and warms up fine, and dies only on the *last* allocation — the MTP draft context's 800 MiB KV cache on the tighter device. #10411 changed only two defaults: the physical batch (now gated) and a VRAM-scaled parallel-slot count. The KV cache is unified (n_ctx_seq == full context proves slots share the budget, so parallel doesn't multiply KV), but n_seq_max=4 still adds per-slot compute-graph / context-checkpoint / output scratch. On a device packed ~99% by a 27B model spanning both cards, that overhead is the few-hundred-MiB straw — which is why reverting #10411 (and only #10411) restores a working load. Gate the parallel-slot default on the same per-device headroom predicate as the batch boost: when a large context already fills a single card (largeContextForDevice), keep n_parallel=1. A user running one big-context model that barely fits across two consumer GPUs is not serving four concurrent tenants. Small contexts and large unified-memory devices (GB10) keep full concurrency. Applied on both the single-host path and the distributed router. Also make the auto-tuning visible and reversible (the debugging here needed DEBUG logs and a git bisect): - Log the effective performance-relevant runtime options at INFO once per model load ("effective runtime tuning …": context, n_batch, n_gpu_layers, parallel, flash_attention, f16) so an admin can see what will run and pin or override any value in the model YAML. - LOCALAI_DISABLE_HARDWARE_DEFAULTS=true skips the hardware auto-tuning entirely (mirrors LOCALAI_DISABLE_GUESSING) for stock llama.cpp behavior. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4.8 [Claude Code]
chore(model gallery): 🤖 add 1 new models via gallery agent (#10505 )
2026-06-25 09:09:07 -04:00 · 2026-06-25 12:57:19 +00:00 · 2026-06-25 08:11:52 +02:00 · 2026-06-25 08:11:31 +02:00 · 2026-06-25 08:11:17 +02:00 · 2026-06-25 08:10:59 +02:00
146 changed files with 3010 additions and 1029 deletions
--- a/.github/backend-matrix.yml
+++ b/.github/backend-matrix.yml
@@ -4974,6 +4974,12 @@ includeDarwin:
  - backend: "kitten-tts"
    tag-suffix: "-metal-darwin-arm64-kitten-tts"
    build-type: "mps"
+  - backend: "trl"
+    tag-suffix: "-metal-darwin-arm64-trl"
+    build-type: "mps"
+  - backend: "liquid-audio"
+    tag-suffix: "-metal-darwin-arm64-liquid-audio"
+    build-type: "mps"
  - backend: "piper"
    tag-suffix: "-metal-darwin-arm64-piper"
    build-type: "metal"
@@ -4990,6 +4996,10 @@ includeDarwin:
    tag-suffix: "-metal-darwin-arm64-sherpa-onnx"
    build-type: "metal"
    lang: "go"
+  - backend: "supertonic"
+    tag-suffix: "-metal-darwin-arm64-supertonic"
+    build-type: "metal"
+    lang: "go"
  - backend: "local-store"
    tag-suffix: "-metal-darwin-arm64-local-store"
    build-type: "metal"
--- a/backend/cpp/ik-llama-cpp/Makefile
+++ b/backend/cpp/ik-llama-cpp/Makefile
@@ -1,5 +1,5 @@

-IK_LLAMA_VERSION?=6c00e87ac84404af588ad2e65935bd6f079c696f
+IK_LLAMA_VERSION?=d5507e33ae7ee2b7b41475f08044d3bde3b839ee
 LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=7c082bc417bbe53210a83df4ba5b49e18ce6193c
+LLAMA_VERSION?=8be759e6f70d629638a7eb70db3824cbdcea370b
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/grpc-server.cpp
+++ b/backend/cpp/llama-cpp/grpc-server.cpp
@@ -37,6 +37,7 @@
 #include "backend.pb.h"
 #include "backend.grpc.pb.h"
 #include "common.h"
+#include "arg.h"
 #include "chat-auto-parser.h"
 #include <getopt.h>
 #include <grpcpp/ext/proto_server_reflection_plugin.h>
@@ -592,6 +593,10 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
    params.checkpoint_min_step = 256;
 #endif

+    // Raw upstream llama-server flags collected from any option entry that
+    // starts with '-'. Applied once after the loop via common_params_parse.
+    std::vector<std::string> extra_argv;
+
     // decode options. Options are in form optname:optvale, or if booleans only optname.
    for (int i = 0; i < request->options_size(); i++) {
        std::string opt = request->options(i);
@@ -1080,6 +1085,31 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                } catch (...) {}
            }

+        // --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) ---
+        } else if (!strcmp(optname, "cpu_moe")) {
+            // Bool-style flag: keep all MoE expert weights on CPU.
+            const bool enable = (optval == NULL) ||
+                optval_str == "true" || optval_str == "1" || optval_str == "yes" ||
+                optval_str == "on" || optval_str == "enabled";
+            if (enable) {
+                params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override());
+            }
+        } else if (!strcmp(optname, "n_cpu_moe")) {
+            if (optval != NULL) {
+                try {
+                    int n = std::stoi(optval_str);
+                    if (n < 0) n = 0;
+                    // Keep override-name storage alive for the lifetime of the
+                    // params struct (mirrors upstream arg.cpp's function-local static).
+                    static std::list<std::string> buft_overrides_main;
+                    for (int i = 0; i < n; ++i) {
+                        buft_overrides_main.push_back(llm_ffn_exps_block_regex(i));
+                        params.tensor_buft_overrides.push_back(
+                            {buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()});
+                    }
+                } catch (...) {}
+            }
+
        // --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) ---
        } else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) {
            // Format: <tensor regex>=<buffer type>,<tensor regex>=<buffer type>,...
@@ -1111,6 +1141,30 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
                else { cur.push_back(c); }
            }
            if (!cur.empty()) flush(cur);
+
+        // --- generic passthrough: any entry starting with '-' is a raw
+        //     upstream llama-server flag, forwarded verbatim to the parser. ---
+        } else if (optname[0] == '-') {
+            std::string flag = optname;
+            // These flags make upstream's parser exit() (printing usage /
+            // completion), which would kill the backend process. Skip them.
+            if (flag == "-h" || flag == "--help" || flag == "--usage" ||
+                flag == "--version" || flag == "--license" ||
+                flag == "--list-devices" || flag == "-cl" ||
+                flag == "--cache-list" ||
+                flag.rfind("--completion", 0) == 0) {
+                fprintf(stderr,
+                    "[llama-cpp] ignoring passthrough flag that would exit: %s\n",
+                    flag.c_str());
+            } else {
+                extra_argv.push_back(flag);
+                // Preserve the whole value after the first ':' so embedded
+                // colons (e.g. host:port) survive strtok's truncation of optval.
+                auto colon = opt.find(':');
+                if (colon != std::string::npos) {
+                    extra_argv.push_back(opt.substr(colon + 1));
+                }
+            }
        }
    }

@@ -1146,27 +1200,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
        }
    }

-    if (!params.kv_overrides.empty()) {
-        params.kv_overrides.emplace_back();
-        params.kv_overrides.back().key[0] = 0;
-    }
-
-    // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
-    // Real entries are pushed during option parsing; here we pad/terminate so the
-    // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
-    // and so llama_params_fit has the placeholder slots it requires.
-    {
-        const size_t ntbo = llama_max_tensor_buft_overrides();
-        while (params.tensor_buft_overrides.size() < ntbo) {
-            params.tensor_buft_overrides.push_back({nullptr, nullptr});
-        }
-    }
-    // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
-    // the main-model handling above.
-    if (!params.speculative.draft.tensor_buft_overrides.empty()) {
-        params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
-    }
-
    // TODO: Add yarn

    if (!request->tensorsplit().empty()) {
@@ -1259,6 +1292,69 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
            params.sampling.grammar_triggers.push_back(std::move(trigger));
        }
    }
+
+    // Apply any raw upstream flags last so an explicit passthrough flag wins
+    // over the LocalAI-resolved field it maps to (e.g. --ctx-size beats
+    // context_size). This is the same parser llama-server itself uses.
+    if (!extra_argv.empty()) {
+        // common_params_parser_init resets a few fields for the SERVER example
+        // (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated
+        // passthrough flag can't silently clobber LocalAI's resolved value.
+        const int saved_n_parallel = params.n_parallel;
+
+        std::vector<char *> argv;
+        std::string prog = "llama-server";
+        argv.push_back(prog.data());
+        for (auto & a : extra_argv) {
+            argv.push_back(a.data());
+        }
+
+        // ctx_arg.params is a reference, so this overlays the given flags onto
+        // `params` in place. Returns false on a recoverable parse error (and
+        // self-restores params); may exit() on a hard error, exactly as
+        // passing the same bad flag to llama-server would.
+        if (!common_params_parse((int)argv.size(), argv.data(), params,
+                                 LLAMA_EXAMPLE_SERVER)) {
+            fprintf(stderr,
+                "[llama-cpp] failed to parse passthrough options; ignoring them\n");
+        }
+
+        // Restore n_parallel unless a passthrough flag explicitly set it
+        // (parser_init's reset sentinel for SERVER is -1).
+        if (params.n_parallel == -1) {
+            params.n_parallel = saved_n_parallel;
+        }
+    }
+
+    // Terminate/pad the override vectors only after BOTH the named-option loop
+    // and the generic passthrough (common_params_parse above) have pushed their
+    // real entries, so back() is the null sentinel the model loader asserts on.
+    // Running these before the passthrough let a passthrough flag (--cpu-moe,
+    // --override-tensor, --override-kv, ...) append a real entry after the
+    // sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for
+    // kv_overrides. Double-termination is harmless (the while is a no-op if the
+    // passthrough parse already padded; an extra trailing null is ignored).
+
+    if (!params.kv_overrides.empty()) {
+        params.kv_overrides.emplace_back();
+        params.kv_overrides.back().key[0] = 0;
+    }
+
+    // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
+    // Real entries are pushed during option parsing; here we pad/terminate so the
+    // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
+    // and so llama_params_fit has the placeholder slots it requires.
+    {
+        const size_t ntbo = llama_max_tensor_buft_overrides();
+        while (params.tensor_buft_overrides.size() < ntbo) {
+            params.tensor_buft_overrides.push_back({nullptr, nullptr});
+        }
+    }
+    // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring
+    // the main-model handling above.
+    if (!params.speculative.draft.tensor_buft_overrides.empty()) {
+        params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
+    }
 }


--- a/backend/go/acestep-cpp/Makefile
+++ b/backend/go/acestep-cpp/Makefile
@@ -117,7 +117,8 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null)

 test: acestep-cpp
 	@echo "Running acestep-cpp tests..."
--- a/backend/go/acestep-cpp/main.go
+++ b/backend/go/acestep-cpp/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,7 +23,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("ACESTEP_LIBRARY")
 	if libName == "" {
-		libName = "./libgoacestepcpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgoacestepcpp-fallback.dylib"
+		} else {
+			libName = "./libgoacestepcpp-fallback.so"
+		}
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/acestep-cpp/package.sh
+++ b/backend/go/acestep-cpp/package.sh
@@ -13,6 +13,7 @@ mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/acestep-cpp $CURDIR/package/
 cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/
+cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/acestep-cpp/run.sh
+++ b/backend/go/acestep-cpp/run.sh
@@ -12,9 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single library variant (Metal or Accelerate). The goacestepcpp
+	# target is built as a CMake MODULE, which emits a .dylib for a SHARED
+	# build but a .so for a MODULE build on Apple, so prefer .dylib and fall
+	# back to .so.
+	LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib"
+	if [ ! -e "$LIBRARY" ]; then
+		LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"
+	fi
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgoacestepcpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then
@@ -36,9 +46,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgoacestepcpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export ACESTEP_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/ced/Makefile
+++ b/backend/go/ced/Makefile
@@ -57,6 +57,7 @@ libced.so: sources/ced.cpp
 	cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS)
 	cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS)
 	cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true
+	cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true
 	cp -fv sources/ced.cpp/include/ced_capi.h ./

 ced-grpc: libced.so main.go goced.go
--- a/backend/go/ced/main.go
+++ b/backend/go/ced/main.go
@@ -12,6 +12,7 @@ import (
 	"flag"
 	"fmt"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ type libFunc struct {
 func main() {
 	libName := os.Getenv("CED_LIBRARY")
 	if libName == "" {
-		libName = "libced.so"
+		if runtime.GOOS == "darwin" {
+			libName = "libced.dylib"
+		} else {
+			libName = "libced.so"
+		}
 	}
 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
 	if err != nil {
--- a/backend/go/ced/package.sh
+++ b/backend/go/ced/package.sh
@@ -15,10 +15,12 @@ mkdir -p "$CURDIR/package/lib"
 cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/"
 cp -avf "$CURDIR/run.sh" "$CURDIR/package/"

-cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || {
-	echo "ERROR: libced.so not found in $CURDIR, run 'make' first" >&2
+cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true
+cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true
+if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then
+	echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2
 	exit 1
-}
+fi

 if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
    echo "Detected x86_64 architecture, copying x86_64 libraries..."
--- a/backend/go/ced/run.sh
+++ b/backend/go/ced/run.sh
@@ -3,7 +3,12 @@ set -e

 CURDIR=$(dirname "$(realpath "$0")")

-export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+if [ "$(uname)" = "Darwin" ]; then
+	export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
+	export CED_LIBRARY="$CURDIR/lib/libced.dylib"
+else
+	export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+fi

 # If a self-contained ld.so was packaged, route through it so the packaged
 # libc / libstdc++ are used instead of the host's (matches the sibling backends).
--- a/backend/go/crispasr/Makefile
+++ b/backend/go/crispasr/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # CrispASR version (release tag)
 CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR
-CRISPASR_VERSION?=7a8cb80907341c0204bd0488c1244764f4163883
+CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc
 SO_TARGET?=libgocrispasr.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -75,7 +75,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so
 else
-	VARIANT_TARGETS = libgocrispasr-fallback.so
+	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
+	VARIANT_TARGETS = libgocrispasr-fallback.dylib
 endif

 crispasr: main.go gocrispasr.go $(VARIANT_TARGETS)
@@ -87,7 +88,7 @@ package: crispasr
 build: package

 clean: purge
-	rm -rf libgocrispasr*.so package sources/CrispASR crispasr
+	rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr

 purge:
 	rm -rf build*
@@ -118,13 +119,21 @@ libgocrispasr-fallback.so: sources/CrispASR
 	SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
 	rm -rfv build*

+# Build fallback variant as a dylib (Darwin)
+libgocrispasr-fallback.dylib: sources/CrispASR
+	$(MAKE) purge
+	$(info ${GREEN}I crispasr build info:fallback (dylib)${RESET})
+	SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom
+	rm -rfv build*
+
 libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null)

 test: crispasr
 	CGO_ENABLED=0 $(GOCMD) test -v ./...
--- a/backend/go/crispasr/main.go
+++ b/backend/go/crispasr/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("CRISPASR_LIBRARY")
 	if libName == "" {
-		libName = "./libgocrispasr-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgocrispasr-fallback.dylib"
+		} else {
+			libName = "./libgocrispasr-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/crispasr/package.sh
+++ b/backend/go/crispasr/package.sh
@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/crispasr $CURDIR/package/
-cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/
+cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/crispasr/run.sh
+++ b/backend/go/crispasr/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgocrispasr-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgocrispasr-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgocrispasr-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgocrispasr-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgocrispasr-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export CRISPASR_LIBRARY=$LIBRARY

 # Point piper's espeak-ng phonemizer at the bundled voice data. The variable
--- a/backend/go/depth-anything-cpp/Makefile
+++ b/backend/go/depth-anything-cpp/Makefile
@@ -77,7 +77,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libdepthanythingcpp-fallback.so
+	VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib
 endif

 depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS)
@@ -89,7 +89,7 @@ package: depth-anything-cpp
 build: package

 clean: purge
-	rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources
+	rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources

 purge:
 	rm -rf build*
@@ -116,11 +116,19 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp
 endif

 # Build fallback variant (all platforms)
+ifeq ($(UNAME_S),Darwin)
+libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp
+	rm -rfv build-$@
+	$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
+	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
+	rm -rfv build-$@
+else
 libdepthanythingcpp-fallback.so: sources/depth-anything.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I depth-anything-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom
 	rm -rfv build-$@
+endif

 libdepthanythingcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -128,7 +136,8 @@ libdepthanythingcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null)

 all: depth-anything-cpp package

--- a/backend/go/depth-anything-cpp/main.go
+++ b/backend/go/depth-anything-cpp/main.go
@@ -9,6 +9,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("DEPTHANYTHING_LIBRARY")
 	if libName == "" {
-		libName = "./libdepthanythingcpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libdepthanythingcpp-fallback.dylib"
+		} else {
+			libName = "./libdepthanythingcpp-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/depth-anything-cpp/package.sh
+++ b/backend/go/depth-anything-cpp/package.sh
@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/
+cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/depth-anything-cpp/run.sh
+++ b/backend/go/depth-anything-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export DEPTHANYTHING_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/localvqe/Makefile
+++ b/backend/go/localvqe/Makefile
@@ -67,8 +67,9 @@ $(LIB_SENTINEL): sources/LocalVQE
 	# that the loader picks at runtime. We must build every target — the
 	# default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY
 	# routes all of them into build/bin; copy them out next to the binary.
-	cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* .
+	cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib .
 	cp -P build/bin/libggml*.so* . 2>/dev/null || true
+	cp -P build/bin/libggml*.dylib . 2>/dev/null || true
 	touch $(LIB_SENTINEL)

 liblocalvqe.so: $(LIB_SENTINEL)
--- a/backend/go/localvqe/main.go
+++ b/backend/go/localvqe/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("LOCALVQE_LIBRARY")
 	if libName == "" {
-		libName = "./liblocalvqe.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./liblocalvqe.dylib"
+		} else {
+			libName = "./liblocalvqe.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/localvqe/package.sh
+++ b/backend/go/localvqe/package.sh
@@ -15,7 +15,9 @@ cp -avf $CURDIR/localvqe $CURDIR/package/
 # liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime
 # variants — LocalVQE picks the matching CPU variant at load time.
 cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true
+cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true
 cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true
+cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/localvqe/run.sh
+++ b/backend/go/localvqe/run.sh
@@ -10,8 +10,19 @@ CURDIR=$(dirname "$(realpath $0)")
 # exec'ing the binary.
 cd "$CURDIR"

-export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
-export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib +
+	# DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case.
+	export DYLD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$DYLD_LIBRARY_PATH
+	LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.dylib
+	if [ ! -e "$LOCALVQE_LIBRARY" ]; then
+		LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
+	fi
+	export LOCALVQE_LIBRARY
+else
+	export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH
+	export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so
+fi

 if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
--- a/backend/go/locate-anything-cpp/Makefile
+++ b/backend/go/locate-anything-cpp/Makefile
@@ -70,7 +70,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = liblocateanythingcpp-fallback.so
+	VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib
 endif

 locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS)
@@ -82,7 +82,7 @@ package: locate-anything-cpp
 build: package

 clean: purge
-	rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources
+	rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources

 purge:
 	rm -rf build*
@@ -109,11 +109,19 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp
 endif

 # Build fallback variant (all platforms)
+ifeq ($(UNAME_S),Darwin)
+liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp
+	rm -rfv build-$@
+	$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
+	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
+	rm -rfv build-$@
+else
 liblocateanythingcpp-fallback.so: sources/locate-anything.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I locate-anything-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom
 	rm -rfv build-$@
+endif

 liblocateanythingcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -121,7 +129,8 @@ liblocateanythingcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null)

 all: locate-anything-cpp package

--- a/backend/go/locate-anything-cpp/main.go
+++ b/backend/go/locate-anything-cpp/main.go
@@ -9,6 +9,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("LOCATEANYTHING_LIBRARY")
 	if libName == "" {
-		libName = "./liblocateanythingcpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./liblocateanythingcpp-fallback.dylib"
+		} else {
+			libName = "./liblocateanythingcpp-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/locate-anything-cpp/package.sh
+++ b/backend/go/locate-anything-cpp/package.sh
@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/
+cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/locate-anything-cpp/run.sh
+++ b/backend/go/locate-anything-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export LOCATEANYTHING_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/omnivoice-cpp/Makefile
+++ b/backend/go/omnivoice-cpp/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # omnivoice.cpp version
 OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp
-OMNIVOICE_VERSION?=96d30169afd5e6bb3fd6a0e9be0eb505bfe81fcd
+OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607
 SO_TARGET?=libgomnivoicecpp.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,7 +65,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so
 else
-	VARIANT_TARGETS = libgomnivoicecpp-fallback.so
+	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
+	VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib
 endif

 omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS)
@@ -77,7 +78,7 @@ package: omnivoice-cpp
 build: package

 clean: purge
-	rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp
+	rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp

 purge:
 	rm -rf build*
@@ -106,13 +107,20 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp
 	SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
 	rm -rf build-libgomnivoicecpp-fallback.so

+# Build fallback variant as a dylib (Darwin)
+libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp
+	$(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET})
+	SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom
+	rm -rf build-libgomnivoicecpp-fallback.dylib
+
 libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)

 test: omnivoice-cpp
 	@echo "Running omnivoice-cpp tests..."
--- a/backend/go/omnivoice-cpp/main.go
+++ b/backend/go/omnivoice-cpp/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("OMNIVOICE_LIBRARY")
 	if libName == "" {
-		libName = "./libgomnivoicecpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgomnivoicecpp-fallback.dylib"
+		} else {
+			libName = "./libgomnivoicecpp-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/omnivoice-cpp/package.sh
+++ b/backend/go/omnivoice-cpp/package.sh
@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/
-cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/
+cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/omnivoice-cpp/run.sh
+++ b/backend/go/omnivoice-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export OMNIVOICE_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/parakeet-cpp/Makefile
+++ b/backend/go/parakeet-cpp/Makefile
@@ -1,6 +1,6 @@
 # parakeet-cpp backend Makefile.
 #
-# Upstream pin lives below as PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
+# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
 # (.github/bump_deps.sh) can find and update it - matches the
 # whisper.cpp / ds4 / vibevoice-cpp convention.
 #
@@ -15,7 +15,7 @@
 # That's what the L0 smoke test uses. The default target below does the
 # proper clone-at-pin + cmake build so CI doesn't need a side-checkout.

-PARAKEET_VERSION?=db755a78d39f789bb7d4e3935158a9e8105dbe36
+PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
 PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp

 GOCMD?=go
@@ -74,6 +74,7 @@ libparakeet.so: sources/parakeet.cpp
 	cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS)
 	cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS)
 	cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true
+	cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true
 	cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./

 parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go
--- a/backend/go/parakeet-cpp/main.go
+++ b/backend/go/parakeet-cpp/main.go
@@ -2,15 +2,17 @@ package main

 // Started internally by LocalAI - one gRPC server per loaded model.
 //
-// Loads libparakeet.so via purego and registers the flat C-API entry
-// points declared in parakeet_capi.h. The library name can be overridden
-// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY
-// convention in the sibling backends); the default looks for the .so next
-// to this binary.
+// Loads the parakeet shared library via purego and registers the flat
+// C-API entry points declared in parakeet_capi.h. The library name can be
+// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY /
+// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default
+// looks next to this binary for libparakeet.so on Linux and
+// libparakeet.dylib on macOS.
 import (
 	"flag"
 	"fmt"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -28,7 +30,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("PARAKEET_LIBRARY")
 	if libName == "" {
-		libName = "libparakeet.so"
+		if runtime.GOOS == "darwin" {
+			libName = "libparakeet.dylib"
+		} else {
+			libName = "libparakeet.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/parakeet-cpp/package.sh
+++ b/backend/go/parakeet-cpp/package.sh
@@ -16,12 +16,15 @@ mkdir -p "$CURDIR/package/lib"
 cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/"
 cp -avf "$CURDIR/run.sh" "$CURDIR/package/"

-# libparakeet.so + any soname symlinks (libparakeet.so.X[.Y]). purego.Dlopen
-# resolves it via LD_LIBRARY_PATH, which run.sh points at lib/.
-cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || {
-	echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2
+# libparakeet shared lib + any soname symlinks. On Linux this is
+# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen
+# resolves it via the *_LIBRARY_PATH that run.sh points at lib/.
+cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true
+cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true
+if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then
+	echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2
 	exit 1
-}
+fi

 # Detect architecture and copy the core runtime libs libparakeet.so links
 # against, plus the matching dynamic loader as lib/ld.so.
@@ -48,7 +51,7 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1"
    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0"
 elif [ "$(uname -s)" = "Darwin" ]; then
-    echo "Detected Darwin"
+    echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
 else
    echo "Error: Could not detect architecture"
    exit 1
--- a/backend/go/parakeet-cpp/run.sh
+++ b/backend/go/parakeet-cpp/run.sh
@@ -3,11 +3,17 @@ set -e

 CURDIR=$(dirname "$(realpath "$0")")

-export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+if [ "$(uname)" = "Darwin" ]; then
+	export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}"
+	export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib"
+else
+	export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}"
+	export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so"
+fi

 # If a self-contained ld.so was packaged, route through it so the
 # packaged libc / libstdc++ are used instead of the host's (matches the
-# whisper backend's runtime layout).
+# whisper backend's runtime layout). Linux only.
 if [ -f "$CURDIR/lib/ld.so" ]; then
 	echo "Using lib/ld.so"
 	exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@"
--- a/backend/go/qwen3-tts-cpp/Makefile
+++ b/backend/go/qwen3-tts-cpp/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # qwentts.cpp version
 QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp
-QWEN3TTS_CPP_VERSION?=4536dcdce27c3764a93a06d6bf64026b124962f5
+QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42
 SO_TARGET?=libgoqwen3ttscpp.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so
+	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
+	VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib
 endif

 qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: qwen3-tts-cpp
 build: package

 clean: purge
-	rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp
+	rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp

 purge:
 	rm -rf build*
@@ -110,13 +110,20 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp
 	SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
 	rm -rf build-libgoqwen3ttscpp-fallback.so

+# Build fallback variant as a dylib (Darwin)
+libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp
+	$(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET})
+	SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom
+	rm -rf build-libgoqwen3ttscpp-fallback.dylib
+
 libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null)

 test: qwen3-tts-cpp
 	@echo "Running qwen3-tts-cpp tests..."
--- a/backend/go/qwen3-tts-cpp/main.go
+++ b/backend/go/qwen3-tts-cpp/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("QWEN3TTS_LIBRARY")
 	if libName == "" {
-		libName = "./libgoqwen3ttscpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgoqwen3ttscpp-fallback.dylib"
+		} else {
+			libName = "./libgoqwen3ttscpp-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/qwen3-tts-cpp/package.sh
+++ b/backend/go/qwen3-tts-cpp/package.sh
@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/
-cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/
+cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/qwen3-tts-cpp/run.sh
+++ b/backend/go/qwen3-tts-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export QWEN3TTS_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/rfdetr-cpp/Makefile
+++ b/backend/go/rfdetr-cpp/Makefile
@@ -71,7 +71,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = librfdetrcpp-avx.so librfdetrcpp-avx2.so librfdetrcpp-avx512.so librfdetrcpp-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = librfdetrcpp-fallback.so
+	VARIANT_TARGETS = librfdetrcpp-fallback.dylib
 endif

 rfdetr-cpp: main.go gorfdetrcpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: rfdetr-cpp
 build: package

 clean: purge
-	rm -rf librfdetrcpp*.so rfdetr-cpp package sources
+	rm -rf librfdetrcpp*.so librfdetrcpp*.dylib rfdetr-cpp package sources

 purge:
 	rm -rf build*
@@ -110,11 +110,19 @@ librfdetrcpp-avx512.so: sources/rt-detr.cpp
 endif

 # Build fallback variant (all platforms)
+ifeq ($(UNAME_S),Darwin)
+librfdetrcpp-fallback.dylib: sources/rt-detr.cpp
+	rm -rfv build-$@
+	$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
+	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
+	rm -rfv build-$@
+else
 librfdetrcpp-fallback.so: sources/rt-detr.cpp
 	rm -rfv build-$@
 	$(info ${GREEN}I rfdetr-cpp build info:fallback${RESET})
 	SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom
 	rm -rfv build-$@
+endif

 librfdetrcpp-custom: CMakeLists.txt
 	mkdir -p build-$(SO_TARGET) && \
@@ -122,7 +130,8 @@ librfdetrcpp-custom: CMakeLists.txt
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/librfdetrcpp.dylib ./$(SO_TARGET) 2>/dev/null)

 all: rfdetr-cpp package

--- a/backend/go/rfdetr-cpp/main.go
+++ b/backend/go/rfdetr-cpp/main.go
@@ -9,6 +9,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -27,7 +28,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("RFDETR_LIBRARY")
 	if libName == "" {
-		libName = "./librfdetrcpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./librfdetrcpp-fallback.dylib"
+		} else {
+			libName = "./librfdetrcpp-fallback.so"
+		}
 	}

 	rfdetrLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/rfdetr-cpp/package.sh
+++ b/backend/go/rfdetr-cpp/package.sh
@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -avf $CURDIR/librfdetrcpp-*.so $CURDIR/package/
+cp -fv $CURDIR/librfdetrcpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/librfdetrcpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/rfdetr-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/rfdetr-cpp/run.sh
+++ b/backend/go/rfdetr-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/librfdetrcpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/librfdetrcpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/librfdetrcpp-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/librfdetrcpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export RFDETR_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/sam3-cpp/Makefile
+++ b/backend/go/sam3-cpp/Makefile
@@ -66,7 +66,7 @@ ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgosam3-avx.so libgosam3-avx2.so libgosam3-avx512.so libgosam3-fallback.so
 else
 	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libgosam3-fallback.so
+	VARIANT_TARGETS = libgosam3-fallback.dylib
 endif

 sam3-cpp: main.go gosam3.go $(VARIANT_TARGETS)
@@ -78,7 +78,7 @@ package: sam3-cpp
 build: package

 clean: purge
-	rm -rf libgosam3*.so sam3-cpp package sources
+	rm -rf libgosam3*.so libgosam3*.dylib sam3-cpp package sources

 purge:
 	rm -rf build*
@@ -105,11 +105,19 @@ libgosam3-avx512.so: sources/sam3.cpp
 endif

 # Build fallback variant (all platforms)
+ifeq ($(UNAME_S),Darwin)
+libgosam3-fallback.dylib: sources/sam3.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
+	SO_TARGET=libgosam3-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
+	rm -rfv build*
+else
 libgosam3-fallback.so: sources/sam3.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I sam3-cpp build info:fallback${RESET})
 	SO_TARGET=libgosam3-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom
 	rm -rfv build*
+endif

 libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
 	mkdir -p build-$(SO_TARGET) && \
@@ -117,6 +125,7 @@ libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgosam3.dylib ./$(SO_TARGET) 2>/dev/null)

 all: sam3-cpp package
--- a/backend/go/sam3-cpp/main.go
+++ b/backend/go/sam3-cpp/main.go
@@ -3,6 +3,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("SAM3_LIBRARY")
 	if libName == "" {
-		libName = "./libgosam3-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgosam3-fallback.dylib"
+		} else {
+			libName = "./libgosam3-fallback.so"
+		}
 	}

 	gosamLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/sam3-cpp/package.sh
+++ b/backend/go/sam3-cpp/package.sh
@@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -avf $CURDIR/libgosam3-*.so $CURDIR/package/
+cp -fv $CURDIR/libgosam3-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgosam3-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/sam3-cpp $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/sam3-cpp/run.sh
+++ b/backend/go/sam3-cpp/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgosam3-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgosam3-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgosam3-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgosam3-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgosam3-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export SAM3_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/sherpa-onnx/backend.go
+++ b/backend/go/sherpa-onnx/backend.go
@@ -7,6 +7,7 @@ import (
 	"fmt"
 	"os"
 	"path/filepath"
+	"runtime"
 	"strconv"
 	"strings"
 	"sync"
@@ -238,11 +239,19 @@ func loadSherpaLibs() error {
 func loadSherpaLibsOnce() error {
 	shimLib := os.Getenv("SHERPA_SHIM_LIBRARY")
 	if shimLib == "" {
-		shimLib = "libsherpa-shim.so"
+		if runtime.GOOS == "darwin" {
+			shimLib = "libsherpa-shim.dylib"
+		} else {
+			shimLib = "libsherpa-shim.so"
+		}
 	}
 	capiLib := os.Getenv("SHERPA_ONNX_LIBRARY")
 	if capiLib == "" {
-		capiLib = "libsherpa-onnx-c-api.so"
+		if runtime.GOOS == "darwin" {
+			capiLib = "libsherpa-onnx-c-api.dylib"
+		} else {
+			capiLib = "libsherpa-onnx-c-api.so"
+		}
 	}

 	shim, err := purego.Dlopen(shimLib, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/sherpa-onnx/run.sh
+++ b/backend/go/sherpa-onnx/run.sh
@@ -3,7 +3,13 @@ set -ex

 CURDIR=$(dirname "$(realpath $0)")

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+if [ "$(uname)" = "Darwin" ]; then
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+	export SHERPA_SHIM_LIBRARY=$CURDIR/lib/libsherpa-shim.dylib
+	export SHERPA_ONNX_LIBRARY=$CURDIR/lib/libsherpa-onnx-c-api.dylib
+else
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+fi

 if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # stablediffusion.cpp (ggml)
 STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
-STABLEDIFFUSION_GGML_VERSION?=b12098f5d09fc83da36e65c784f7bdb16a5a5ebf
+STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f

 CMAKE_ARGS+=-DGGML_MAX_NAME=128

@@ -131,6 +131,7 @@ libgosd-custom: CMakeLists.txt cpp/gosd.cpp cpp/gosd.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgosd.dylib ./$(SO_TARGET) 2>/dev/null)

 all: stablediffusion-ggml package
--- a/backend/go/stablediffusion-ggml/main.go
+++ b/backend/go/stablediffusion-ggml/main.go
@@ -3,6 +3,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("SD_LIBRARY")
 	if libName == "" {
-		libName = "./libgosd-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgosd-fallback.dylib"
+		} else {
+			libName = "./libgosd-fallback.so"
+		}
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/stablediffusion-ggml/package.sh
+++ b/backend/go/stablediffusion-ggml/package.sh
@@ -12,6 +12,7 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
+cp -fv $CURDIR/libgosd-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/stablediffusion-ggml/run.sh
+++ b/backend/go/stablediffusion-ggml/run.sh
@@ -12,9 +12,18 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgosd-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single library variant (Metal or Accelerate). The gosd target is
+	# built as a CMake MODULE, which emits a .dylib for a SHARED build but a
+	# .so for a MODULE build on Apple, so prefer .dylib and fall back to .so.
+	LIBRARY="$CURDIR/libgosd-fallback.dylib"
+	if [ ! -e "$LIBRARY" ]; then
+		LIBRARY="$CURDIR/libgosd-fallback.so"
+	fi
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgosd-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgosd-avx.so ]; then
@@ -36,9 +45,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgosd-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export SD_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/go/supertonic/helper.go
+++ b/backend/go/supertonic/helper.go
@@ -16,6 +16,7 @@ import (
 	"os"
 	"path/filepath"
 	"regexp"
+	"runtime"
 	"strings"
 	"time"
 	"unicode"
@@ -943,7 +944,13 @@ func InitializeONNXRuntime() error {
 			}
 		}
 		if libPath == "" {
-			libPath = "/usr/local/lib/libonnxruntime.so"
+			// LocalAI: default to the platform-native shared library
+			// extension when nothing else is found (dyld vs ld.so).
+			if runtime.GOOS == "darwin" {
+				libPath = "/usr/local/lib/libonnxruntime.dylib"
+			} else {
+				libPath = "/usr/local/lib/libonnxruntime.so"
+			}
 		}
 	}
 	ort.SetSharedLibraryPath(libPath)
--- a/backend/go/supertonic/package.sh
+++ b/backend/go/supertonic/package.sh
@@ -32,6 +32,10 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
+elif [ $(uname -s) = "Darwin" ]; then
+    # macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in
+    # run.sh); there is no ld.so loader nor glibc to bundle.
+    echo "Detected Darwin"
 else
    echo "Error: Could not detect architecture"
    exit 1
--- a/backend/go/supertonic/run.sh
+++ b/backend/go/supertonic/run.sh
@@ -3,12 +3,19 @@ set -ex

 CURDIR=$(dirname "$(realpath $0)")

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
-export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS uses dyld: there is no ld.so loader, and the search path env
+	# var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here.
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+	export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib
+else
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+	export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so

-if [ -f $CURDIR/lib/ld.so ]; then
-	echo "Using lib/ld.so"
-	exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
+	if [ -f $CURDIR/lib/ld.so ]; then
+		echo "Using lib/ld.so"
+		exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@"
+	fi
 fi

 exec $CURDIR/supertonic "$@"
--- a/backend/go/vibevoice-cpp/Makefile
+++ b/backend/go/vibevoice-cpp/Makefile
@@ -70,8 +70,8 @@ UNAME_S := $(shell uname -s)
 ifeq ($(UNAME_S),Linux)
 	VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so
 else
-	# On non-Linux (e.g., Darwin), build only fallback variant
-	VARIANT_TARGETS = libgovibevoicecpp-fallback.so
+	# On non-Linux (e.g., Darwin), build only fallback variant (as a dylib)
+	VARIANT_TARGETS = libgovibevoicecpp-fallback.dylib
 endif

 vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS)
@@ -83,7 +83,7 @@ package: vibevoice-cpp
 build: package

 clean: purge
-	rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp
+	rm -rf libgovibevoicecpp*.so libgovibevoicecpp*.dylib package sources/vibevoice.cpp vibevoice-cpp

 purge:
 	rm -rf build*
@@ -119,13 +119,21 @@ libgovibevoicecpp-fallback.so: sources/vibevoice.cpp
 	SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
 	rm -rfv build*

+# Build fallback variant as a dylib (Darwin)
+libgovibevoicecpp-fallback.dylib: sources/vibevoice.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I vibevoice-cpp build info:fallback (dylib)${RESET})
+	SO_TARGET=libgovibevoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom
+	rm -rfv build*
+
 libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h
 	mkdir -p build-$(SO_TARGET) && \
 	cd build-$(SO_TARGET) && \
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET)
+	(mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgovibevoicecpp.dylib ./$(SO_TARGET) 2>/dev/null)

 test: vibevoice-cpp
 	@echo "Running vibevoice-cpp tests..."
--- a/backend/go/vibevoice-cpp/main.go
+++ b/backend/go/vibevoice-cpp/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -21,7 +22,11 @@ type LibFuncs struct {
 func main() {
 	libName := os.Getenv("VIBEVOICECPP_LIBRARY")
 	if libName == "" {
-		libName = "./libgovibevoicecpp-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgovibevoicecpp-fallback.dylib"
+		} else {
+			libName = "./libgovibevoicecpp-fallback.so"
+		}
 	}

 	lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/vibevoice-cpp/package.sh
+++ b/backend/go/vibevoice-cpp/package.sh
@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/vibevoice-cpp $CURDIR/package/
-cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/
+cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgovibevoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/vibevoice-cpp/run.sh
+++ b/backend/go/vibevoice-cpp/run.sh
@@ -11,9 +11,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgovibevoicecpp-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgovibevoicecpp-avx.so ]; then
@@ -34,9 +38,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgovibevoicecpp-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export VIBEVOICECPP_LIBRARY=$LIBRARY

 if [ -f $CURDIR/lib/ld.so ]; then
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=5ed76e9a079962f1c85cfce44edd325c27ef1f97
+WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -117,6 +117,7 @@ libgowhisper-custom: CMakeLists.txt cpp/gowhisper.cpp cpp/gowhisper.h
 	cmake .. $(CMAKE_ARGS) && \
 	cmake --build . --config Release -j$(JOBS) && \
 	cd .. && \
-	mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET)
+	mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET) 2>/dev/null || \
+		mv build-$(SO_TARGET)/libgowhisper.dylib ./$(SO_TARGET:.so=.dylib)

 all: whisper package
--- a/backend/go/whisper/main.go
+++ b/backend/go/whisper/main.go
@@ -4,6 +4,7 @@ package main
 import (
 	"flag"
 	"os"
+	"runtime"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -22,7 +23,11 @@ func main() {
 	// Get library name from environment variable, default to fallback
 	libName := os.Getenv("WHISPER_LIBRARY")
 	if libName == "" {
-		libName = "./libgowhisper-fallback.so"
+		if runtime.GOOS == "darwin" {
+			libName = "./libgowhisper-fallback.dylib"
+		} else {
+			libName = "./libgowhisper-fallback.so"
+		}
 	}

 	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
--- a/backend/go/whisper/package.sh
+++ b/backend/go/whisper/package.sh
@@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.."
 mkdir -p $CURDIR/package/lib

 cp -avf $CURDIR/whisper $CURDIR/package/
-cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/
+cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgowhisper-*.dylib $CURDIR/package/ 2>/dev/null || true
 cp -fv $CURDIR/run.sh $CURDIR/package/

 # Detect architecture and copy appropriate libraries
--- a/backend/go/whisper/run.sh
+++ b/backend/go/whisper/run.sh
@@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then
 	grep -e "flags" /proc/cpuinfo | head -1
 fi

-LIBRARY="$CURDIR/libgowhisper-fallback.so"
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgowhisper-fallback.so"

-if [ "$(uname)" != "Darwin" ]; then
 	if grep -q -e "\savx\s" /proc/cpuinfo ; then
 		echo "CPU:    AVX    found OK"
 		if [ -e $CURDIR/libgowhisper-avx.so ]; then
@@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then
 			LIBRARY="$CURDIR/libgowhisper-avx512.so"
 		fi
 	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 fi

-export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
 export WHISPER_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -1284,6 +1284,7 @@
    nvidia-cuda-13: "cuda13-liquid-audio"
    nvidia-cuda-12: "cuda12-liquid-audio"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio"
+    metal: "metal-liquid-audio"
  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png
 - &qwen-tts
  urls:
@@ -1569,6 +1570,7 @@
    - TTS
  capabilities:
    default: "cpu-supertonic"
+    metal: "metal-supertonic"
 - !!merge <<: *neutts
  name: "neutts-development"
  capabilities:
@@ -4612,6 +4614,7 @@
    nvidia-cuda-13: "cuda13-liquid-audio-development"
    nvidia-cuda-12: "cuda12-liquid-audio-development"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio-development"
+    metal: "metal-liquid-audio-development"
 - !!merge <<: *liquid-audio
  name: "cpu-liquid-audio"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-liquid-audio"
@@ -4622,6 +4625,16 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-liquid-audio"
  mirrors:
    - localai/localai-backends:master-cpu-liquid-audio
+- !!merge <<: *liquid-audio
+  name: "metal-liquid-audio"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-liquid-audio"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-liquid-audio
+- !!merge <<: *liquid-audio
+  name: "metal-liquid-audio-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-liquid-audio"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-liquid-audio
 - !!merge <<: *liquid-audio
  name: "cuda12-liquid-audio"
  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-liquid-audio"
@@ -5282,6 +5295,7 @@
    nvidia: "cuda12-trl"
    nvidia-cuda-12: "cuda12-trl"
    nvidia-cuda-13: "cuda13-trl"
+    metal: "metal-trl"
 ## TRL backend images
 - !!merge <<: *trl
  name: "cpu-trl"
@@ -5313,6 +5327,16 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-trl"
  mirrors:
    - localai/localai-backends:master-gpu-nvidia-cuda-13-trl
+- !!merge <<: *trl
+  name: "metal-trl"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-trl"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-trl
+- !!merge <<: *trl
+  name: "metal-trl-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-trl"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-trl
 ## llama.cpp quantization backend
 - &llama-cpp-quantization
  name: "llama-cpp-quantization"
@@ -5484,6 +5508,7 @@
  name: "supertonic-development"
  capabilities:
    default: "cpu-supertonic-development"
+    metal: "metal-supertonic-development"
 - !!merge <<: *supertonic
  name: "cpu-supertonic"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic"
@@ -5494,3 +5519,13 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic"
  mirrors:
    - localai/localai-backends:master-cpu-supertonic
+- !!merge <<: *supertonic
+  name: "metal-supertonic"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-supertonic"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-supertonic
+- !!merge <<: *supertonic
+  name: "metal-supertonic-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-supertonic"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-supertonic
--- a/backend/python/liquid-audio/install.sh
+++ b/backend/python/liquid-audio/install.sh
@@ -14,5 +14,11 @@ else
 fi

 # liquid-audio's torch wheels are large; allow upgrades to satisfy transitive pins
-EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
+# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
+# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
+# it on the uv path; Linux/CUDA resolution is unchanged.
+if [ "x${USE_PIP:-}" != "xtrue" ]; then
+    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
+fi
 installRequirements
--- a/backend/python/liquid-audio/requirements-mps.txt
+++ b/backend/python/liquid-audio/requirements-mps.txt
@@ -1,3 +1,4 @@
+# MPS (Apple Silicon / Metal) build profile - installed by the darwin CI job.
 torch>=2.8.0
 torchaudio>=2.8.0
 torchcodec>=0.9.1
--- a/backend/python/trl/install.sh
+++ b/backend/python/trl/install.sh
@@ -8,7 +8,13 @@ else
    source $backend_dir/../common/libbackend.sh
 fi

-EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade"
+# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip
+# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add
+# it when uv is the installer, keeping the Linux/CUDA resolution unchanged.
+if [ "x${USE_PIP:-}" != "xtrue" ]; then
+    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match"
+fi
 installRequirements

 # Fetch convert_hf_to_gguf.py and gguf package from the same llama.cpp version
--- a/backend/python/trl/requirements-mps.txt
+++ b/backend/python/trl/requirements-mps.txt
@@ -0,0 +1,12 @@
+torch==2.10.0
+trl
+peft
+datasets>=3.0.0
+transformers>=4.56.2
+accelerate>=1.4.0
+huggingface-hub>=1.3.0
+sentencepiece
+# Note: bitsandbytes is intentionally omitted on MPS. It is only used by the
+# CUDA (cublas) variants for 8-bit/4-bit quantization and has poor support on
+# Apple Silicon. torch here uses the plain PyPI wheels, which ship MPS support
+# on macOS arm64.
--- a/core/application/config_file_watcher.go
+++ b/core/application/config_file_watcher.go
@@ -215,6 +215,7 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
 		envBackendGalleries := slices.Equal(appConfig.BackendGalleries, startupAppConfig.BackendGalleries)
 		envAutoloadGalleries := appConfig.AutoloadGalleries == startupAppConfig.AutoloadGalleries
 		envAutoloadBackendGalleries := appConfig.AutoloadBackendGalleries == startupAppConfig.AutoloadBackendGalleries
+		envPIIDefaultDetectors := slices.Equal(appConfig.PIIDefaultDetectors, startupAppConfig.PIIDefaultDetectors)
 		envAgentJobRetentionDays := appConfig.AgentJobRetentionDays == startupAppConfig.AgentJobRetentionDays
 		envForceEvictionWhenBusy := appConfig.ForceEvictionWhenBusy == startupAppConfig.ForceEvictionWhenBusy
 		envLRUEvictionMaxRetries := appConfig.LRUEvictionMaxRetries == startupAppConfig.LRUEvictionMaxRetries
@@ -335,6 +336,15 @@ func readRuntimeSettingsJson(startupAppConfig config.ApplicationConfig) fileHand
 			if settings.AutoloadBackendGalleries != nil && !envAutoloadBackendGalleries {
 				appConfig.AutoloadBackendGalleries = *settings.AutoloadBackendGalleries
 			}
+			if settings.PIIDefaultDetectors != nil && !envPIIDefaultDetectors {
+				// Request-side default redaction reads this live via
+				// ResolvePIIPolicy, so a file edit takes effect on the next chat
+				// request. The MITM listener resolves its per-host detector map
+				// once at start, so a raw file edit reaches cloud-proxy traffic
+				// only after a restart or a POST /api/settings (which rebuilds
+				// the listener) — the admin UI uses the latter.
+				appConfig.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
+			}
 			if settings.AutoUpgradeBackends != nil {
 				appConfig.AutoUpgradeBackends = *settings.AutoUpgradeBackends
 			}
--- a/core/application/runtime_settings_branding_test.go
+++ b/core/application/runtime_settings_branding_test.go
@@ -109,6 +109,52 @@ var _ = Describe("loadRuntimeSettingsFromFile", func() {
 		})
 	})

+	// Instance-wide default PII detectors. The file is the only source (no
+	// env var), and the loader runs immediately before startMITMIfConfigured,
+	// so a regression here means the cloud-proxy MITM listener resolves an
+	// empty detector set at boot and forwards intercepted traffic unredacted —
+	// even though pii_default_detectors is on disk and the MITM model has PII
+	// enabled. It also breaks request-side default redaction the same way.
+	Describe("PII default detectors", func() {
+		It("loads pii_default_detectors from the file", func() {
+			cfg := &config.ApplicationConfig{DynamicConfigsDir: seedSettings(`{"pii_default_detectors": ["privacy-filter-nemotron", "secret-filter"]}`)}
+			loadRuntimeSettingsFromFile(cfg)
+			Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"privacy-filter-nemotron", "secret-filter"}))
+		})
+
+		It("does not override an env/CLI-set value (LOCALAI_PII_DEFAULT_DETECTORS)", func() {
+			cfg := &config.ApplicationConfig{
+				DynamicConfigsDir:   seedSettings(`{"pii_default_detectors": ["from-file"]}`),
+				PIIDefaultDetectors: []string{"from-env"}, // simulate WithPIIDefaultDetectors(env)
+			}
+			loadRuntimeSettingsFromFile(cfg)
+			Expect(cfg.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env var must win over the persisted file value")
+		})
+	})
+
+	// The live file watcher applies pii_default_detectors on a runtime change
+	// the same way it handles galleries/threads/etc.: env-set values (current
+	// == startup snapshot) are left alone, otherwise the file value is applied
+	// to the live config so request-side default redaction picks it up without
+	// a restart.
+	Describe("file watcher: pii_default_detectors", func() {
+		It("applies a changed file value to the live config", func() {
+			startup := config.ApplicationConfig{} // no env baseline
+			live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"old"}}
+			handler := readRuntimeSettingsJson(startup)
+			Expect(handler([]byte(`{"pii_default_detectors":["new-a","new-b"]}`), live)).To(Succeed())
+			Expect(live.PIIDefaultDetectors).To(Equal([]string{"new-a", "new-b"}))
+		})
+
+		It("leaves an env-controlled value untouched", func() {
+			startup := config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
+			live := &config.ApplicationConfig{PIIDefaultDetectors: []string{"from-env"}}
+			handler := readRuntimeSettingsJson(startup)
+			Expect(handler([]byte(`{"pii_default_detectors":["from-file"]}`), live)).To(Succeed())
+			Expect(live.PIIDefaultDetectors).To(Equal([]string{"from-env"}), "env-controlled detectors must not be overwritten by the file")
+		})
+	})
+
 	// The Agent Pool block has a mix of zero and non-zero defaults
 	// (Enabled=true, EmbeddingModel="granite-...", MaxChunkingSize=400,
 	// VectorEngine="chromem", AgentHubURL="https://agenthub.localai.io").
--- a/core/application/startup.go
+++ b/core/application/startup.go
@@ -750,6 +750,20 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
 		options.MITMListen = *settings.MITMListen
 	}

+	// Instance-wide default PII detectors. LOCALAI_PII_DEFAULT_DETECTORS (via
+	// WithPIIDefaultDetectors) wins when set; otherwise the file is the source
+	// — apply it only when the env/CLI left the value empty, mirroring the
+	// "env > file" precedence used for the other fields. This must land before
+	// startMITMIfConfigured (called right after this loader): the cloud-proxy
+	// listener resolves each intercept host's detectors once at start via
+	// ResolvePIIPolicy, and a MITM model that names no detectors of its own
+	// falls back to these defaults. Without it the listener (and request-side
+	// default redaction) starts with an empty detector set and forwards
+	// traffic unredacted even though pii_default_detectors is on disk.
+	if settings.PIIDefaultDetectors != nil && len(options.PIIDefaultDetectors) == 0 {
+		options.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...)
+	}
+
 	// Backend upgrade flags
 	if settings.AutoUpgradeBackends != nil {
 		if !options.AutoUpgradeBackends {
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -140,7 +140,7 @@ type RunCMD struct {
 	OIDCIssuer           string `env:"LOCALAI_OIDC_ISSUER" help:"OIDC issuer URL for auto-discovery" group:"auth"`
 	OIDCClientID         string `env:"LOCALAI_OIDC_CLIENT_ID" help:"OIDC Client ID (auto-enables auth)" group:"auth"`
 	OIDCClientSecret     string `env:"LOCALAI_OIDC_CLIENT_SECRET" help:"OIDC Client Secret" group:"auth"`
-	AuthBaseURL          string `env:"LOCALAI_BASE_URL" help:"Base URL for OAuth callbacks (e.g. http://localhost:8080)" group:"auth"`
+	ExternalBaseURL      string `env:"LOCALAI_BASE_URL" help:"External base URL of this instance (e.g. https://localhost:8080). Used for OAuth callbacks and self-referential links (generated images/videos, job status). When unset, derived from X-Forwarded-Proto/Host or Forwarded headers." group:"api"`
 	AuthAdminEmail       string `env:"LOCALAI_ADMIN_EMAIL" help:"Email address to auto-promote to admin role" group:"auth"`
 	AuthRegistrationMode string `env:"LOCALAI_REGISTRATION_MODE" default:"open" help:"Registration mode: 'open' (default), 'approval', or 'invite' (invite code required)" group:"auth"`
 	DisableLocalAuth     bool   `env:"LOCALAI_DISABLE_LOCAL_AUTH" default:"false" help:"Disable local email/password registration and login (use with OAuth/OIDC-only setups)" group:"auth"`
@@ -181,6 +181,8 @@ type RunCMD struct {
 	// Cloud-proxy MITM listener (off by default).
 	MITMListen string `env:"LOCALAI_MITM_LISTEN" help:"Address (host:port) for the cloudproxy MITM listener. Empty = disabled. Clients set HTTPS_PROXY=http://<this>:<port>. Intercept hosts are declared per-model via the model YAML mitm.hosts: block; create one from the Add Model UI." group:"middleware"`
 	MITMCADir  string `env:"LOCALAI_MITM_CA_DIR" type:"path" help:"Directory holding the MITM proxy CA cert + key. Defaults to <data-path>/mitm-ca." group:"middleware"`
+
+	PIIDefaultDetectors []string `env:"LOCALAI_PII_DEFAULT_DETECTORS" help:"Instance-wide default PII/secret detector model names applied to any PII-enabled model (chiefly cloud-proxy / MITM models) that names no pii.detectors of its own. Comma-separated, e.g. privacy-filter-nemotron,secret-filter. Takes precedence over the value persisted via the Middleware UI." group:"middleware"`
 }

 func (r *RunCMD) Run(ctx *cliContext.Context) error {
@@ -243,6 +245,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		config.WithAPIAddress(r.Address),
 		config.WithMITMListen(r.MITMListen),
 		config.WithMITMCADir(r.MITMCADir),
+		config.WithPIIDefaultDetectors(r.PIIDefaultDetectors),
 		config.WithAgentJobRetentionDays(r.AgentJobRetentionDays),
 		config.WithLlamaCPPTunnelCallback(func(tunnels []string) {
 			tunnelEnvVar := strings.Join(tunnels, ",")
@@ -500,9 +503,6 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 			opts = append(opts, config.WithAuthOIDCClientID(r.OIDCClientID))
 			opts = append(opts, config.WithAuthOIDCClientSecret(r.OIDCClientSecret))
 		}
-		if r.AuthBaseURL != "" {
-			opts = append(opts, config.WithAuthBaseURL(r.AuthBaseURL))
-		}
 		if r.AuthAdminEmail != "" {
 			opts = append(opts, config.WithAuthAdminEmail(r.AuthAdminEmail))
 		}
@@ -520,6 +520,12 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 		}
 	}

+	// Applied unconditionally: the external base URL governs all self-referential
+	// links (not just OAuth callbacks), so it must take effect even when auth is off.
+	if r.ExternalBaseURL != "" {
+		opts = append(opts, config.WithExternalBaseURL(r.ExternalBaseURL))
+	}
+
 	if idleWatchDog || busyWatchDog {
 		opts = append(opts, config.EnableWatchDog)
 		if idleWatchDog {
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -49,6 +49,13 @@ type ApplicationConfig struct {
 	P2PNetworkID                  string
 	Federated                     bool

+	// ExternalBaseURL is the externally visible base URL of this instance
+	// (scheme+host[:port]), set via LOCALAI_BASE_URL. When non-empty it is
+	// authoritative for every self-referential URL LocalAI emits (OAuth
+	// callbacks, generated image/video links, async job StatusURLs),
+	// overriding proxy-header detection. Empty = derive from request headers.
+	ExternalBaseURL string
+
 	// DisableStats turns off per-request token tracking. By default the
 	// routing module's billing recorder runs in every mode (including
 	// no-auth single-user) so dashboards and `/api/usage` are immediately
@@ -196,7 +203,6 @@ type AuthConfig struct {
 	OIDCIssuer          string // OIDC issuer URL for auto-discovery (e.g. https://accounts.google.com)
 	OIDCClientID        string
 	OIDCClientSecret    string
-	BaseURL             string // for OAuth callback URLs (e.g. "http://localhost:8080")
 	AdminEmail          string // auto-promote to admin on login
 	RegistrationMode    string // "open", "approval" (default when empty), "invite"
 	DisableLocalAuth    bool   // disable local email/password registration and login
@@ -712,6 +718,18 @@ func WithMITMCADir(dir string) AppOption {
 	}
 }

+// WithPIIDefaultDetectors sets the instance-wide default PII/secret detector
+// model names applied to any PII-enabled model (chiefly cloud-proxy / MITM
+// models) that names no pii.detectors of its own. CLI/env:
+// LOCALAI_PII_DEFAULT_DETECTORS. Empty leaves the value to
+// runtime_settings.json / the Middleware UI; a non-empty value takes
+// precedence over the file (env > file).
+func WithPIIDefaultDetectors(detectors []string) AppOption {
+	return func(o *ApplicationConfig) {
+		o.PIIDefaultDetectors = detectors
+	}
+}
+
 func WithDynamicConfigDir(dynamicConfigsDir string) AppOption {
 	return func(o *ApplicationConfig) {
 		o.DynamicConfigsDir = dynamicConfigsDir
@@ -938,9 +956,9 @@ func WithAuthGitHubClientSecret(clientSecret string) AppOption {
 	}
 }

-func WithAuthBaseURL(baseURL string) AppOption {
+func WithExternalBaseURL(url string) AppOption {
 	return func(o *ApplicationConfig) {
-		o.Auth.BaseURL = baseURL
+		o.ExternalBaseURL = url
 	}
 }

--- a/core/config/hardware_defaults.go
+++ b/core/config/hardware_defaults.go
@@ -2,6 +2,7 @@ package config

 import (
 	"fmt"
+	"os"
 	"strconv"
 	"strings"

@@ -9,6 +10,19 @@ import (
 	"github.com/mudler/xlog"
 )

+// HardwareDefaultsDisabled reports whether hardware auto-tuning is turned off via
+// LOCALAI_DISABLE_HARDWARE_DEFAULTS=true (mirrors LOCALAI_DISABLE_GUESSING). When
+// set, ApplyHardwareDefaults and the distributed router's node tuning are
+// skipped entirely, so the backend runs llama.cpp's stock batch/parallel
+// behavior — an escape hatch for users who want predictable, un-tuned defaults.
+func HardwareDefaultsDisabled() bool {
+	// Read directly like the sibling LOCALAI_DISABLE_GUESSING toggle in
+	// hooks_llamacpp.go: these config-layer heuristic switches run deep in the
+	// defaults pipeline with no ApplicationConfig in scope to plumb through.
+	//nolint:forbidigo // config-layer heuristic toggle, mirrors LOCALAI_DISABLE_GUESSING
+	return os.Getenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS") == "true"
+}
+
 // Hardware-driven model-config defaults.
 //
 // This sits alongside the other config overriders (ApplyInferenceDefaults for
@@ -54,8 +68,35 @@ func (g GPU) IsNVIDIABlackwell() bool {
 	return maj >= 12
 }

+// Compute-buffer headroom guard for the raised physical batch.
+//
+// Raising n_ubatch grows the CUDA *compute buffer* (the scratch for the forward
+// graph), which is allocated PER DEVICE — it does not benefit from a second GPU
+// the way weights or KV (which are split across devices) do. The buffer scales
+// ~linearly with n_ubatch * n_ctx, so a large context turns the GB10-tuned
+// ub2048 into multi-GiB of extra scratch that must fit on a SINGLE card. On a
+// 16 GiB consumer Blackwell with a 200k context that overflows (issue #10485),
+// even though the GB10 it was measured on (128 GiB unified memory) had room.
+//
+// These constants size a conservative guard: only raise the batch when the
+// extra scratch fits the per-device VRAM ceiling.
+const (
+	// computeBufferBytesPerCell approximates the CUDA compute-buffer cost of one
+	// (n_ubatch * n_ctx) cell. Derived from an observed allocation (ub2048 *
+	// ctx204800 ~= 4.5 GiB => ~11 B/cell) and rounded up to 16 for margin, since
+	// the real cost also grows with model width (heads / embedding dim) which we
+	// don't know at config time.
+	computeBufferBytesPerCell = 16
+	// blackwellBatchHeadroomDivisor caps the extra compute buffer from raising the
+	// physical batch at VRAM/divisor. /4 keeps the bulk of a device for weights +
+	// KV, which already dominate VRAM use.
+	blackwellBatchHeadroomDivisor = 4
+)
+
 // PhysicalBatch returns the canonical physical batch (n_batch/n_ubatch) for the
-// given hardware, used when the model config leaves batch unset.
+// given hardware class, ignoring context/VRAM headroom. Use
+// PhysicalBatchForContext when a model context and per-device VRAM are known
+// (the load paths) so the raised batch can't overflow a single device.
 func PhysicalBatch(g GPU) int {
 	if g.IsNVIDIABlackwell() {
 		return BlackwellPhysicalBatch
@@ -63,6 +104,51 @@ func PhysicalBatch(g GPU) int {
 	return DefaultPhysicalBatch
 }

+// PhysicalBatchForContext is PhysicalBatch gated on per-device VRAM headroom for
+// the given context: it only raises the batch above the conservative default
+// when the extra compute buffer (which is allocated on a single device and grows
+// with n_ubatch * n_ctx) fits within blackwellBatchHeadroomDivisor of the GPU's
+// VRAM. g.VRAM must be the PER-DEVICE ceiling (the smallest device on a
+// multi-GPU host), not the summed total — the compute buffer can't be split.
+//
+// VRAM 0 (unknown) stays conservative rather than risk a per-device OOM; the
+// GB10 / unified-memory path reports system RAM, so it still clears the guard.
+func PhysicalBatchForContext(g GPU, ctx int) int {
+	if !g.IsNVIDIABlackwell() {
+		return DefaultPhysicalBatch
+	}
+	if g.VRAM == 0 {
+		return DefaultPhysicalBatch
+	}
+	if largeContextForDevice(g, ctx) {
+		return DefaultPhysicalBatch
+	}
+	return BlackwellPhysicalBatch
+}
+
+// largeContextForDevice reports whether the given context is large relative to
+// the per-device VRAM ceiling — the shared "tight single-model fit" signal that
+// suppresses BOTH throughput-oriented defaults (the Blackwell batch boost and
+// the concurrency slot count). It sizes the extra compute-buffer scratch a
+// raised batch would need at this context (which grows ~n_ubatch * n_ctx and
+// is allocated per device) and asks whether it overflows a fraction of the
+// device VRAM; when it does, the device has no headroom to spend on throughput
+// and the conservative defaults must hold (issue #10485).
+//
+// g.VRAM must be the PER-DEVICE ceiling (the smallest device on a multi-GPU
+// host). VRAM 0 (unknown) is treated as not-large so detection gaps don't
+// silently disable the defaults.
+func largeContextForDevice(g GPU, ctx int) bool {
+	if g.VRAM == 0 {
+		return false
+	}
+	if ctx <= 0 {
+		ctx = DefaultContextSize
+	}
+	extra := uint64(ctx) * uint64(BlackwellPhysicalBatch-DefaultPhysicalBatch) * computeBufferBytesPerCell
+	return extra > g.VRAM/blackwellBatchHeadroomDivisor
+}
+
 // IsManagedPhysicalBatch reports whether n is a value PhysicalBatch assigns.
 // Callers that re-tune a value chosen by an upstream host (the distributed
 // router correcting the frontend's guess) use this to avoid clobbering an
@@ -99,17 +185,50 @@ func DefaultParallelSlots(g GPU) int {
 	}
 }

-// EnsureParallelOption appends a VRAM-scaled "parallel:N" backend option when the
-// model doesn't already set one (and the GPU warrants concurrency). Returns the
-// possibly-extended options. Shared by the single-host config path
-// (ApplyHardwareDefaults) and the distributed router (per selected node).
-func EnsureParallelOption(opts []string, gpu GPU) []string {
-	if slots := DefaultParallelSlots(gpu); slots > 1 && !hasParallelOption(opts) {
+// ParallelSlotsForContext is DefaultParallelSlots gated on per-device VRAM
+// headroom for the given context. A large context already claims most of a
+// single device's VRAM (the KV cache plus the per-slot compute/checkpoint
+// scratch that scales with n_seq_max), so defaulting multiple slots there
+// pushes a tight single-model fit into per-device CUDA OOM (issue #10485): the
+// model loads but the final allocation (e.g. an MTP draft context's KV cache)
+// overflows the tighter card by a few hundred MiB. Returns 1 (no concurrency)
+// in that tight regime, otherwise the VRAM-scaled DefaultParallelSlots.
+//
+// g.VRAM must be the PER-DEVICE ceiling (smallest device on a multi-GPU host).
+// It shares largeContextForDevice with the batch boost so both throughput
+// defaults are suppressed together; the GB10 / unified-memory path reports
+// system RAM and so keeps full concurrency even at large contexts.
+func ParallelSlotsForContext(g GPU, ctx int) int {
+	slots := DefaultParallelSlots(g)
+	if slots <= 1 || g.VRAM == 0 {
+		return slots
+	}
+	if largeContextForDevice(g, ctx) {
+		return 1
+	}
+	return slots
+}
+
+// EnsureParallelOptionForContext appends a VRAM-scaled "parallel:N" backend
+// option when the model doesn't already set one and the GPU warrants (and has
+// headroom for) concurrency at this context. Returns the possibly-extended
+// options. Shared by the single-host config path (ApplyHardwareDefaults) and
+// the distributed router (per selected node).
+func EnsureParallelOptionForContext(opts []string, gpu GPU, ctx int) []string {
+	if slots := ParallelSlotsForContext(gpu, ctx); slots > 1 && !hasParallelOption(opts) {
 		return append(opts, fmt.Sprintf("parallel:%d", slots))
 	}
 	return opts
 }

+// EnsureParallelOption is EnsureParallelOptionForContext with no known context
+// (defaults to DefaultContextSize, which clears the headroom gate on any device
+// large enough to warrant concurrency). Kept for callers without a model
+// context.
+func EnsureParallelOption(opts []string, gpu GPU) []string {
+	return EnsureParallelOptionForContext(opts, gpu, 0)
+}
+
 // hasParallelOption reports whether the model already sets parallel/n_parallel
 // so we never override an explicit value (helper shared with serving_defaults.go).
 func hasParallelOption(opts []string) bool {
@@ -122,7 +241,12 @@ func hasParallelOption(opts []string) bool {
 // deterministic device — detection does a live nvidia-smi call.
 var localGPU = func() GPU {
 	vendor, _ := xsysinfo.DetectGPUVendor()
-	vram, _ := xsysinfo.TotalAvailableVRAM()
+	// Use the SMALLEST device's VRAM, not the summed total: the parallel-slot
+	// tier and the batch headroom guard both reason about what fits on a single
+	// card, and per-device compute buffers can't be split across GPUs. Summing
+	// two 16 GiB cards into "32 GiB" is what over-provisioned multi-GPU hosts
+	// into OOM (issue #10485).
+	vram, _ := xsysinfo.MinPerGPUVRAM()
 	return GPU{
 		Vendor:            vendor,
 		ComputeCapability: xsysinfo.NVIDIAComputeCapability(),
@@ -134,25 +258,36 @@ var localGPU = func() GPU {
 // and were left unset by the user. Currently: a larger physical batch on
 // Blackwell. Explicit config always wins (we only touch zero values).
 func ApplyHardwareDefaults(cfg *ModelConfig, gpu GPU) {
-	if cfg == nil {
+	if cfg == nil || HardwareDefaultsDisabled() {
 		return
 	}
-	if cfg.Batch == 0 && gpu.IsNVIDIABlackwell() {
-		cfg.Batch = BlackwellPhysicalBatch
-		xlog.Debug("[hardware_defaults] Blackwell GPU: defaulting physical batch",
-			"batch", cfg.Batch, "compute_cap", gpu.ComputeCapability)
+	// Raise the physical batch on Blackwell only when the resulting compute
+	// buffer fits the per-device VRAM at THIS model's context. Leaving Batch at 0
+	// (rather than writing the default 512) preserves the downstream single-pass
+	// sizing in core/backend.EffectiveBatchSize for embedding/score/rerank.
+	ctx := DefaultContextSize
+	if cfg.ContextSize != nil {
+		ctx = *cfg.ContextSize
+	}
+	if cfg.Batch == 0 {
+		if PhysicalBatchForContext(gpu, ctx) == BlackwellPhysicalBatch {
+			cfg.Batch = BlackwellPhysicalBatch
+			xlog.Debug("[hardware_defaults] Blackwell GPU: defaulting physical batch",
+				"batch", cfg.Batch, "compute_cap", gpu.ComputeCapability, "context", ctx, "vram_gib", gpu.VRAM>>30)
+		}
 	}

 	// Enable concurrent serving by default on a capable GPU: without this the
 	// llama.cpp backend runs n_parallel=1 and serializes multi-user requests
 	// (continuous batching stays off). Unified KV means the slots share the
-	// context budget, so this is concurrency without extra KV memory. Explicit
-	// parallel/n_parallel in the model options always wins.
+	// context budget, but a context large enough to fill a single device leaves
+	// no room for the per-slot scratch, so the slot count is gated on per-device
+	// headroom too (issue #10485). Explicit parallel/n_parallel always wins.
 	if before := len(cfg.Options); true {
-		cfg.Options = EnsureParallelOption(cfg.Options, gpu)
+		cfg.Options = EnsureParallelOptionForContext(cfg.Options, gpu, ctx)
 		if len(cfg.Options) > before {
 			xlog.Debug("[hardware_defaults] defaulting parallel slots for concurrent serving",
-				"option", cfg.Options[len(cfg.Options)-1], "vram_gib", gpu.VRAM>>30)
+				"option", cfg.Options[len(cfg.Options)-1], "context", ctx, "vram_gib", gpu.VRAM>>30)
 		}
 	}
 }
--- a/core/config/hardware_defaults_internal_test.go
+++ b/core/config/hardware_defaults_internal_test.go
@@ -9,26 +9,37 @@ import (
 // GPU. The detection seam (localGPU) is injected so the path is deterministic
 // without a real GPU.
 var _ = Describe("SetDefaults hardware defaults (single-instance)", func() {
+	const gib = uint64(1) << 30
+
 	var orig func() GPU
 	BeforeEach(func() { orig = localGPU })
 	AfterEach(func() { localGPU = orig })

-	It("sets the physical batch on a local Blackwell GPU", func() {
-		localGPU = func() GPU { return GPU{ComputeCapability: "12.1"} }
+	It("sets the physical batch on a local Blackwell GPU with headroom", func() {
+		localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} }
 		cfg := &ModelConfig{}
 		cfg.SetDefaults()
 		Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch))
 	})

+	It("leaves batch unset when a large context would overflow the device", func() {
+		// Regression guard for issue #10485: 16 GiB consumer Blackwell + ~200k ctx.
+		localGPU = func() GPU { return GPU{ComputeCapability: "12.0", VRAM: 16 * gib} }
+		ctx := 204800
+		cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
+		cfg.SetDefaults()
+		Expect(cfg.Batch).To(Equal(0))
+	})
+
 	It("leaves batch unset on a non-Blackwell local GPU", func() {
-		localGPU = func() GPU { return GPU{ComputeCapability: "8.9"} }
+		localGPU = func() GPU { return GPU{ComputeCapability: "8.9", VRAM: 119 * gib} }
 		cfg := &ModelConfig{}
 		cfg.SetDefaults()
 		Expect(cfg.Batch).To(Equal(0))
 	})

 	It("never overrides an explicit batch", func() {
-		localGPU = func() GPU { return GPU{ComputeCapability: "12.1"} }
+		localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} }
 		cfg := &ModelConfig{}
 		cfg.Batch = 1024
 		cfg.SetDefaults()
--- a/core/config/hardware_defaults_test.go
+++ b/core/config/hardware_defaults_test.go
@@ -7,6 +7,8 @@ import (
 )

 var _ = Describe("Hardware-driven config defaults", func() {
+	const gib = uint64(1) << 30
+
 	DescribeTable("GPU.IsNVIDIABlackwell (sm_12x consumer family)",
 		func(cc string, want bool) {
 			Expect(GPU{ComputeCapability: cc}.IsNVIDIABlackwell()).To(Equal(want))
@@ -35,29 +37,69 @@ var _ = Describe("Hardware-driven config defaults", func() {
 		})
 	})

+	Describe("PhysicalBatchForContext (per-device VRAM headroom)", func() {
+		It("raises the batch when the compute buffer fits the device", func() {
+			// 16 GiB Blackwell with a small context: the extra scratch is tiny.
+			Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 8192)).
+				To(Equal(BlackwellPhysicalBatch))
+		})
+		It("keeps the default batch when a large context would overflow one device", func() {
+			// The issue #10485 case: 16 GiB consumer Blackwell, ~200k context.
+			Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 204800)).
+				To(Equal(DefaultPhysicalBatch))
+		})
+		It("still raises the batch on a large unified-memory device (GB10)", func() {
+			// GB10 reports system RAM (~119 GiB) as its single device's VRAM.
+			Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1", VRAM: 119 * gib}, 204800)).
+				To(Equal(BlackwellPhysicalBatch))
+		})
+		It("stays conservative when VRAM is unknown", func() {
+			Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1"}, 8192)).
+				To(Equal(DefaultPhysicalBatch))
+		})
+		It("never raises the batch on non-Blackwell", func() {
+			Expect(PhysicalBatchForContext(GPU{ComputeCapability: "9.0", VRAM: 80 * gib}, 8192)).
+				To(Equal(DefaultPhysicalBatch))
+		})
+	})
+
 	Describe("ApplyHardwareDefaults", func() {
-		It("raises an unset batch to 2048 on Blackwell", func() {
+		It("raises an unset batch to 2048 on Blackwell with headroom", func() {
 			cfg := &ModelConfig{}
-			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1"})
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
 			Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch))
 		})
+		It("leaves batch unset when a large context would overflow one device", func() {
+			// Regression guard for issue #10485: 16 GiB card + ~200k context.
+			ctx := 204800
+			cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib})
+			Expect(cfg.Batch).To(Equal(0))
+		})
 		It("leaves batch unset on non-Blackwell", func() {
 			cfg := &ModelConfig{}
-			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "9.0"})
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "9.0", VRAM: 119 * gib})
 			Expect(cfg.Batch).To(Equal(0))
 		})
 		It("never overrides an explicit batch", func() {
 			cfg := &ModelConfig{}
 			cfg.Batch = 1024
-			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1"})
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
 			Expect(cfg.Batch).To(Equal(1024))
 		})
 		It("no-ops on nil", func() {
 			Expect(func() { ApplyHardwareDefaults(nil, GPU{ComputeCapability: "12.1"}) }).ToNot(Panic())
 		})
-	})

-	const gib = uint64(1) << 30
+		It("applies nothing when hardware defaults are disabled via env", func() {
+			GinkgoT().Setenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS", "true")
+			Expect(HardwareDefaultsDisabled()).To(BeTrue())
+			cfg := &ModelConfig{}
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
+			Expect(cfg.Batch).To(Equal(0))
+			Expect(cfg.Options).To(BeEmpty())
+		})
+	})

 	DescribeTable("DefaultParallelSlots (by VRAM)",
 		func(vramGiB uint64, want int) {
@@ -72,12 +114,46 @@ var _ = Describe("Hardware-driven config defaults", func() {
 		Entry("unknown 0", uint64(0), 1),
 	)

+	Describe("ParallelSlotsForContext (per-device VRAM headroom)", func() {
+		It("keeps the VRAM-scaled slot count when the context fits the device", func() {
+			// 16 GiB card, small context: plenty of room for concurrency.
+			Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 8192)).To(Equal(4))
+		})
+		It("drops to a single slot when a large context already fills the device", func() {
+			// Regression guard for issue #10485: 16 GiB consumer Blackwell, ~200k
+			// context. Even with unified KV, the per-slot compute/checkpoint
+			// scratch from 4 slots is the straw that overflows the tighter device.
+			Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 204800)).To(Equal(1))
+		})
+		It("keeps concurrency on a large unified-memory device (GB10)", func() {
+			// GB10 reports system RAM (~119 GiB): a 200k context leaves headroom.
+			Expect(ParallelSlotsForContext(GPU{VRAM: 119 * gib}, 204800)).To(Equal(8))
+		})
+		It("keeps concurrency on a big datacenter card with a large context", func() {
+			// 80 GiB A100: 200k context is a small fraction, concurrency stays.
+			Expect(ParallelSlotsForContext(GPU{VRAM: 80 * gib}, 204800)).To(Equal(8))
+		})
+		It("stays a single slot on small/unknown VRAM regardless of context", func() {
+			Expect(ParallelSlotsForContext(GPU{VRAM: 2 * gib}, 8192)).To(Equal(1))
+			Expect(ParallelSlotsForContext(GPU{}, 8192)).To(Equal(1))
+		})
+	})
+
 	Describe("ApplyHardwareDefaults parallel slots", func() {
 		It("adds a VRAM-scaled parallel option on a capable GPU", func() {
 			cfg := &ModelConfig{}
 			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib})
 			Expect(cfg.Options).To(ContainElement("parallel:8"))
 		})
+		It("adds no parallel option when a large context already fills one device", func() {
+			// Regression guard for issue #10485: 16 GiB card + ~200k context. The
+			// model barely fits; defaulting concurrency tips the tighter GPU into
+			// CUDA OOM during the final (MTP draft) KV allocation.
+			ctx := 204800
+			cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}}
+			ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib})
+			Expect(cfg.Options).ToNot(ContainElement(ContainSubstring("parallel")))
+		})
 		It("scales the slot count down with VRAM", func() {
 			cfg := &ModelConfig{}
 			ApplyHardwareDefaults(cfg, GPU{VRAM: 24 * gib})
--- a/core/config/model_config.go
+++ b/core/config/model_config.go
@@ -1204,11 +1204,6 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) {
 	// This ensures gallery-installed and runtime-loaded models get optimal parameters.
 	ApplyInferenceDefaults(cfg, cfg.Name, cfg.Model)

-	// Apply hardware-driven defaults (e.g. a larger physical batch on Blackwell).
-	// Uses the local GPU here; in distributed mode the router re-applies the same
-	// heuristics for the selected node's GPU before loading. Explicit config wins.
-	ApplyHardwareDefaults(cfg, localGPU())
-
 	// Apply serving-policy defaults (device-independent): cross-request prefix
 	// caching. Propagates to distributed nodes via the model options.
 	ApplyServingDefaults(cfg)
@@ -1247,6 +1242,16 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) {
 		cfg.ContextSize = &ctx
 	}
 	runBackendHooks(cfg, lo.modelPath)
+
+	// Apply hardware-driven defaults (e.g. a larger physical batch on Blackwell)
+	// LAST, after the context size is fully resolved (explicit config, LoadOptions,
+	// then the GGUF guess inside runBackendHooks): the Blackwell batch guard sizes
+	// the per-device compute buffer against this model's context, so it must see
+	// the final value, not a pre-guess nil. Uses the local GPU here; in distributed
+	// mode the router re-applies the same heuristics for the selected node's GPU
+	// before loading. Explicit config always wins.
+	ApplyHardwareDefaults(cfg, localGPU())
+
 	cfg.syncKnownUsecasesFromString()
 }

--- a/core/config/runtime_settings_persist.go
+++ b/core/config/runtime_settings_persist.go
@@ -5,6 +5,7 @@ import (
 	"errors"
 	"os"
 	"path/filepath"
+	"reflect"
 )

 // runtimeSettingsFile is the on-disk filename inside DynamicConfigsDir.
@@ -33,6 +34,35 @@ func (o *ApplicationConfig) ReadPersistedSettings() (RuntimeSettings, error) {
 	return settings, nil
 }

+// MergeNonNil overlays every set (non-nil) field of overlay onto the
+// receiver, leaving the receiver's value untouched wherever overlay left a
+// field unset. Every RuntimeSettings field is a pointer precisely so "set"
+// can be told apart from "absent" (see the type doc), which makes this a
+// faithful partial update: a caller that submits only the field it owns
+// changes exactly that field and never clobbers unrelated settings.
+//
+// This is the read-modify-write contract the persistence helpers exist for.
+// UpdateSettingsEndpoint reads the on-disk settings, merges the request body
+// on top, and writes the result — so a focused admin page that POSTs only its
+// own field (the Middleware page sends only mitm_listen; the detector table
+// only pii_default_detectors) no longer nulls every other setting.
+//
+// Reflection keeps the merge total over the struct: a field added to
+// RuntimeSettings later is merged automatically, so the persistence path can
+// never silently drop a new setting the way a hand-maintained field list
+// would. Non-pointer fields (none today) are skipped — they cannot express
+// "absent", so the receiver wins.
+func (s *RuntimeSettings) MergeNonNil(overlay RuntimeSettings) {
+	dst := reflect.ValueOf(s).Elem()
+	src := reflect.ValueOf(overlay)
+	for i := 0; i < src.NumField(); i++ {
+		f := src.Field(i)
+		if f.Kind() == reflect.Pointer && !f.IsNil() {
+			dst.Field(i).Set(f)
+		}
+	}
+}
+
 // WritePersistedSettings serialises the given RuntimeSettings to
 // runtime_settings.json with restricted permissions (it may carry API
 // keys and P2P tokens).
--- a/core/config/runtime_settings_persist_test.go
+++ b/core/config/runtime_settings_persist_test.go
@@ -12,6 +12,7 @@ import (
 )

 func strPtr(s string) *string { return &s }
+func boolPtr(b bool) *bool     { return &b }

 var _ = Describe("RuntimeSettings persistence helpers", func() {
 	var (
@@ -51,6 +52,47 @@ var _ = Describe("RuntimeSettings persistence helpers", func() {
 		})
 	})

+	// MergeNonNil is the partial-update primitive UpdateSettingsEndpoint
+	// relies on: a focused admin page POSTs only the field it owns, and the
+	// handler reads the on-disk settings and overlays the request on top.
+	// Without it, the body would be written verbatim and every field the
+	// caller omitted would be nulled (the reported regression: changing
+	// mitm_listen wiped the galleries, api keys, watchdog config, etc.).
+	Describe("MergeNonNil partial update", func() {
+		It("overlays set fields and preserves unset ones", func() {
+			base := config.RuntimeSettings{
+				MITMListen:          strPtr(":9000"),
+				Galleries:           &[]config.Gallery{{Name: "g1", URL: "http://example/g1"}},
+				WatchdogIdleEnabled: boolPtr(true),
+				ApiKeys:             &[]string{"persisted-key"},
+				PIIDefaultDetectors: &[]string{"det-a"},
+			}
+
+			// Simulate the Middleware proxy tab: only mitm_listen is sent.
+			overlay := config.RuntimeSettings{MITMListen: strPtr(":8443")}
+			base.MergeNonNil(overlay)
+
+			Expect(base.MITMListen).ToNot(BeNil())
+			Expect(*base.MITMListen).To(Equal(":8443"), "set field should be overlaid")
+			// Everything the overlay left unset must survive untouched.
+			Expect(base.Galleries).ToNot(BeNil(), "galleries were clobbered")
+			Expect(*base.Galleries).To(HaveLen(1))
+			Expect(base.WatchdogIdleEnabled).ToNot(BeNil())
+			Expect(*base.WatchdogIdleEnabled).To(BeTrue())
+			Expect(base.ApiKeys).ToNot(BeNil(), "api_keys were clobbered")
+			Expect(*base.ApiKeys).To(Equal([]string{"persisted-key"}))
+			Expect(base.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were clobbered")
+			Expect(*base.PIIDefaultDetectors).To(Equal([]string{"det-a"}))
+		})
+
+		It("lets an explicit empty slice clear a field", func() {
+			base := config.RuntimeSettings{PIIDefaultDetectors: &[]string{"det-a"}}
+			base.MergeNonNil(config.RuntimeSettings{PIIDefaultDetectors: &[]string{}})
+			Expect(base.PIIDefaultDetectors).ToNot(BeNil())
+			Expect(*base.PIIDefaultDetectors).To(BeEmpty(), "an explicit empty slice should clear, not preserve")
+		})
+	})
+
 	// MITM round trip pins the contract that loadRuntimeSettingsFromFile
 	// MITM listener address must survive a write/read round trip so the
 	// next process restart can bring the listener back up. (Intercept
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -149,6 +149,18 @@ func API(application *application.Application) (*echo.Echo, error) {
 	// Middleware - StripPathPrefix must be registered early as it uses Rewrite which runs before routing
 	e.Pre(httpMiddleware.StripPathPrefix())

+	// Stamp the configured external base URL into each request context so
+	// middleware.BaseURL can treat it as authoritative for self-referential
+	// links. Registered as Pre so it runs before routing and handlers.
+	if extBaseURL := application.ApplicationConfig().ExternalBaseURL; extBaseURL != "" {
+		e.Pre(func(next echo.HandlerFunc) echo.HandlerFunc {
+			return func(c echo.Context) error {
+				c.Set("_external_base_url", extBaseURL)
+				return next(c)
+			}
+		})
+	}
+
 	e.Pre(middleware.RemoveTrailingSlash())

 	if application.ApplicationConfig().MachineTag != "" {
--- a/core/http/endpoints/localai/agent_collections.go
+++ b/core/http/endpoints/localai/agent_collections.go
@@ -70,7 +70,7 @@ func UploadToCollectionEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		file, err := c.FormFile("file")
 		if err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]string{"error": "file required"})
@@ -116,7 +116,7 @@ func ListCollectionEntriesEndpoint(app *application.Application) echo.HandlerFun
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		entries, err := svc.ListCollectionEntriesForUser(userID, c.Param("name"))
+		entries, err := svc.ListCollectionEntriesForUser(userID, decodedParam(c, "name"))
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -139,7 +139,7 @@ func GetCollectionEntryContentEndpoint(app *application.Application) echo.Handle
 		if err != nil {
 			entry = entryParam
 		}
-		content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, c.Param("name"), entry)
+		content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, decodedParam(c, "name"), entry)
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -164,7 +164,7 @@ func SearchCollectionEndpoint(app *application.Application) echo.HandlerFunc {
 		if err := c.Bind(&payload); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
 		}
-		results, err := svc.SearchCollectionForUser(userID, c.Param("name"), payload.Query, payload.MaxResults)
+		results, err := svc.SearchCollectionForUser(userID, decodedParam(c, "name"), payload.Query, payload.MaxResults)
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -182,7 +182,7 @@ func ResetCollectionEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		if err := svc.ResetCollectionForUser(userID, c.Param("name")); err != nil {
+		if err := svc.ResetCollectionForUser(userID, decodedParam(c, "name")); err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
 			}
@@ -202,7 +202,7 @@ func DeleteCollectionEntryEndpoint(app *application.Application) echo.HandlerFun
 		if err := c.Bind(&payload); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
 		}
-		remaining, err := svc.DeleteCollectionEntryForUser(userID, c.Param("name"), payload.Entry)
+		remaining, err := svc.DeleteCollectionEntryForUser(userID, decodedParam(c, "name"), payload.Entry)
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -230,7 +230,7 @@ func AddCollectionSourceEndpoint(app *application.Application) echo.HandlerFunc
 		if payload.UpdateInterval < 1 {
 			payload.UpdateInterval = 60
 		}
-		if err := svc.AddCollectionSourceForUser(userID, c.Param("name"), payload.URL, payload.UpdateInterval); err != nil {
+		if err := svc.AddCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL, payload.UpdateInterval); err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
 			}
@@ -250,7 +250,7 @@ func RemoveCollectionSourceEndpoint(app *application.Application) echo.HandlerFu
 		if err := c.Bind(&payload); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
 		}
-		if err := svc.RemoveCollectionSourceForUser(userID, c.Param("name"), payload.URL); err != nil {
+		if err := svc.RemoveCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL); err != nil {
 			return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
 		}
 		return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -267,7 +267,7 @@ func GetCollectionEntryRawFileEndpoint(app *application.Application) echo.Handle
 		if err != nil {
 			entry = entryParam
 		}
-		fpath, err := svc.GetCollectionEntryFilePathForUser(userID, c.Param("name"), entry)
+		fpath, err := svc.GetCollectionEntryFilePathForUser(userID, decodedParam(c, "name"), entry)
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
@@ -282,7 +282,7 @@ func ListCollectionSourcesEndpoint(app *application.Application) echo.HandlerFun
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		sources, err := svc.ListCollectionSourcesForUser(userID, c.Param("name"))
+		sources, err := svc.ListCollectionSourcesForUser(userID, decodedParam(c, "name"))
 		if err != nil {
 			if strings.Contains(err.Error(), "not found") {
 				return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
--- a/core/http/endpoints/localai/agent_collections_param_test.go
+++ b/core/http/endpoints/localai/agent_collections_param_test.go
@@ -0,0 +1,49 @@
+package localai
+
+import (
+	"net/http"
+	"net/http/httptest"
+
+	"github.com/labstack/echo/v4"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+// Regression for #10443: agent/collection names carry a "legacy-api-key:"
+// prefix, so the ':' is percent-encoded as %3A in the request path. Echo routes
+// such paths via URL.RawPath and stores the path-param value still escaped, so
+// handlers must URL-decode it before looking the collection up in the store -
+// otherwise the lookup sees "legacy-api-key%3ALiteraryResearch" and 404s.
+var _ = Describe("decodedParam", func() {
+	var e *echo.Echo
+
+	BeforeEach(func() {
+		e = echo.New()
+	})
+
+	// route runs a request through Echo's real router so the path param is
+	// populated exactly as it would be in production, then returns the decoded
+	// value the handler would observe.
+	route := func(rawPath string) string {
+		var got string
+		e.GET("/api/agents/collections/:name/upload", func(c echo.Context) error {
+			got = decodedParam(c, "name")
+			return c.NoContent(http.StatusOK)
+		})
+		req := httptest.NewRequest(http.MethodGet, rawPath, nil)
+		rec := httptest.NewRecorder()
+		e.ServeHTTP(rec, req)
+		Expect(rec.Code).To(Equal(http.StatusOK))
+		return got
+	}
+
+	It("decodes a percent-encoded colon in the collection name", func() {
+		got := route("/api/agents/collections/legacy-api-key%3ALiteraryResearch/upload")
+		Expect(got).To(Equal("legacy-api-key:LiteraryResearch"))
+	})
+
+	It("leaves an unencoded name untouched", func() {
+		got := route("/api/agents/collections/PlainCollection/upload")
+		Expect(got).To(Equal("PlainCollection"))
+	})
+})
--- a/core/http/endpoints/localai/agents.go
+++ b/core/http/endpoints/localai/agents.go
@@ -6,6 +6,7 @@ import (
 	"io"
 	"maps"
 	"net/http"
+	"net/url"
 	"os"
 	"path/filepath"
 	"slices"
@@ -33,6 +34,22 @@ func getUserID(c echo.Context) string {
 	return user.ID
 }

+// decodedParam returns the named path parameter, URL-decoding it.
+//
+// Echo routes a request via URL.RawPath whenever the path contains
+// percent-encoded characters (e.g. %3A for ':'), and in that case stores the
+// matched path-param value raw/escaped. Agent and collection names carry a
+// "legacy-api-key:" prefix, so the ':' arrives as %3A and the raw param no
+// longer matches the stored name. Callers must unescape before lookups.
+// Falls back to the raw value if it isn't valid percent-encoding.
+func decodedParam(c echo.Context, name string) string {
+	raw := c.Param(name)
+	if decoded, err := url.PathUnescape(raw); err == nil {
+		return decoded
+	}
+	return raw
+}
+
 // isAdminUser returns true if the authenticated user has admin role.
 func isAdminUser(c echo.Context) bool {
 	user := auth.GetUser(c)
@@ -127,7 +144,7 @@ func GetAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")

 		statuses := svc.ListAgentsForUser(userID)
 		active, exists := statuses[name]
@@ -142,7 +159,7 @@ func UpdateAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		var cfg state.AgentConfig
 		if err := c.Bind(&cfg); err != nil {
 			return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()})
@@ -161,7 +178,7 @@ func DeleteAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		if err := svc.DeleteAgentForUser(userID, name); err != nil {
 			return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()})
 		}
@@ -173,7 +190,7 @@ func GetAgentConfigEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		cfg := svc.GetAgentConfigForUser(userID, name)
 		if cfg == nil {
 			return c.JSON(http.StatusNotFound, map[string]string{"error": "Agent not found"})
@@ -186,7 +203,7 @@ func PauseAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		if err := svc.PauseAgentForUser(userID, c.Param("name")); err != nil {
+		if err := svc.PauseAgentForUser(userID, decodedParam(c, "name")); err != nil {
 			return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
 		}
 		return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -197,7 +214,7 @@ func ResumeAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		if err := svc.ResumeAgentForUser(userID, c.Param("name")); err != nil {
+		if err := svc.ResumeAgentForUser(userID, decodedParam(c, "name")); err != nil {
 			return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
 		}
 		return c.JSON(http.StatusOK, map[string]string{"status": "ok"})
@@ -208,7 +225,7 @@ func GetAgentStatusEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")

 		history := svc.GetAgentStatusForUser(userID, name)
 		if history == nil {
@@ -241,7 +258,7 @@ func GetAgentObservablesEndpoint(app *application.Application) echo.HandlerFunc
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")

 		history, err := svc.GetAgentObservablesForUser(userID, name)
 		if err != nil {
@@ -261,7 +278,7 @@ func ClearAgentObservablesEndpoint(app *application.Application) echo.HandlerFun
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		if err := svc.ClearAgentObservablesForUser(userID, name); err != nil {
 			return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
 		}
@@ -273,7 +290,7 @@ func ChatWithAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		var payload struct {
 			Message string `json:"message"`
 		}
@@ -302,7 +319,7 @@ func AgentSSEEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")

 		// Try local SSE manager first
 		manager := svc.GetSSEManagerForUser(userID, name)
@@ -334,7 +351,7 @@ func ExportAgentEndpoint(app *application.Application) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		svc := app.AgentPoolService()
 		userID := effectiveUserID(c)
-		name := c.Param("name")
+		name := decodedParam(c, "name")
 		data, err := svc.ExportAgentForUser(userID, name)
 		if err != nil {
 			return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()})
--- a/core/http/endpoints/localai/settings.go
+++ b/core/http/endpoints/localai/settings.go
@@ -4,8 +4,6 @@ import (
 	"encoding/json"
 	"io"
 	"net/http"
-	"os"
-	"path/filepath"
 	"time"

 	"github.com/labstack/echo/v4"
@@ -110,6 +108,18 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			})
 		}

+		// Read whatever is already persisted: it is both the source of truth
+		// for branding asset filenames (below) and the base we merge this
+		// request onto before writing. A read failure must not let a Save
+		// silently discard the existing settings — surface it instead.
+		persisted, err := appConfig.ReadPersistedSettings()
+		if err != nil {
+			return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
+				Success: false,
+				Error:   "Failed to read existing settings: " + err.Error(),
+			})
+		}
+
 		// Branding asset filenames are owned exclusively by
 		// /api/branding/asset/{kind} (upload/delete). The Settings page also
 		// round-trips them via GET /api/settings, but its local state is stale
@@ -118,11 +128,9 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 		// at page open. Replace whatever the body sent for these three fields
 		// with the values currently on disk so /api/settings can never
 		// regress them.
-		if existing, err := appConfig.ReadPersistedSettings(); err == nil {
-			settings.LogoFile = existing.LogoFile
-			settings.LogoHorizontalFile = existing.LogoHorizontalFile
-			settings.FaviconFile = existing.FaviconFile
-		}
+		settings.LogoFile = persisted.LogoFile
+		settings.LogoHorizontalFile = persisted.LogoHorizontalFile
+		settings.FaviconFile = persisted.FaviconFile

 		// The UI reads ApiKeys from GET /api/settings, which already returns the
 		// merged env+runtime list. When the user clicks Save, the same merged
@@ -145,16 +153,17 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			settings.ApiKeys = &runtimeOnly
 		}

-		settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
-		settingsJSON, err := json.MarshalIndent(settings, "", "  ")
-		if err != nil {
-			return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
-				Success: false,
-				Error:   "Failed to marshal settings: " + err.Error(),
-			})
-		}
-
-		if err := os.WriteFile(settingsFile, settingsJSON, 0600); err != nil {
+		// Persist as a partial update: overlay only the fields this request set
+		// onto the settings already on disk. Focused admin pages POST just the
+		// keys they own (the Middleware proxy tab sends only mitm_listen; the
+		// detector table only pii_default_detectors), so writing the request
+		// body verbatim would null every unrelated setting (the no-omitempty
+		// api_keys / pii_default_detectors fields even round-trip as JSON
+		// null). The full Settings page still round-trips every field, so its
+		// Save is unchanged.
+		toPersist := persisted
+		toPersist.MergeNonNil(settings)
+		if err := appConfig.WritePersistedSettings(toPersist); err != nil {
 			return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
 				Success: false,
 				Error:   "Failed to write settings file: " + err.Error(),
@@ -262,7 +271,14 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
 			}
 		}

-		if settings.MITMListen != nil {
+		// Rebuild the MITM listener when its address OR the instance-wide
+		// default detectors change. The per-host detector map is resolved once
+		// at listener start (startMITMLocked → ResolvePIIPolicy), so a
+		// default-detector change is otherwise invisible to cloud-proxy traffic
+		// until the next restart — an admin toggling a default detector would
+		// see no redaction. RestartMITM is a no-op when the listener is
+		// disabled (empty address).
+		if settings.MITMListen != nil || settings.PIIDefaultDetectors != nil {
 			if err := app.RestartMITM(); err != nil {
 				xlog.Error("Failed to restart MITM proxy", "error", err)
 				return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{
--- a/core/http/endpoints/localai/settings_test.go
+++ b/core/http/endpoints/localai/settings_test.go
@@ -52,6 +52,10 @@ var _ = Describe("Settings endpoints", func() {
 		// Settings are persisted here; set after construction since there's no
 		// dedicated AppOption for it.
 		app.ApplicationConfig().DynamicConfigsDir = tmp
+		// Contain the MITM CA inside tmp too. The partial-save spec flips
+		// mitm_listen, which starts the listener and writes a CA; without this
+		// it defaults to ./mitm-ca and litters the package source tree.
+		app.ApplicationConfig().MITMCADir = filepath.Join(tmp, "mitm-ca")

 		e = echo.New()
 		e.GET("/api/settings", GetSettingsEndpoint(app))
@@ -109,6 +113,57 @@ var _ = Describe("Settings endpoints", func() {
 		Expect(err).ToNot(HaveOccurred())
 	})

+	// Regression: a focused admin page (the Middleware proxy tab) POSTs only
+	// the one field it owns — mitm_listen. The old handler wrote the request
+	// body verbatim, so every other persisted setting was dropped (and
+	// api_keys / pii_default_detectors, which lack omitempty, were written as
+	// null). A partial POST must now merge onto what is already on disk.
+	It("preserves unrelated persisted settings when a partial POST sets only mitm_listen", func() {
+		// First save establishes a fuller settings file (as the full Settings
+		// page would): galleries, an API key, and the MITM listener. The
+		// listener restart binds a real socket, so use 127.0.0.1:0 for an
+		// ephemeral free port rather than a fixed one that may be in use.
+		rec := post(`{"mitm_listen":"127.0.0.1:0","galleries":[{"name":"g1","url":"http://example/g1"}],"api_keys":["k1"],"pii_default_detectors":["det-a"]}`)
+		Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
+
+		// The Middleware proxy tab then changes only the listen address — the
+		// exact partial body that nulled everything else before the fix.
+		rec = post(`{"mitm_listen":"127.0.0.1:0"}`)
+		Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
+
+		raw, err := os.ReadFile(filepath.Join(tmp, "runtime_settings.json"))
+		Expect(err).ToNot(HaveOccurred())
+		var ondisk config.RuntimeSettings
+		Expect(json.Unmarshal(raw, &ondisk)).To(Succeed())
+
+		Expect(ondisk.MITMListen).ToNot(BeNil())
+		Expect(*ondisk.MITMListen).To(Equal("127.0.0.1:0"), "the changed field should be saved")
+		Expect(ondisk.Galleries).ToNot(BeNil(), "galleries were clobbered by the partial save")
+		Expect(*ondisk.Galleries).To(HaveLen(1))
+		Expect(ondisk.ApiKeys).ToNot(BeNil(), "api_keys were nulled by the partial save")
+		Expect(*ondisk.ApiKeys).To(Equal([]string{"k1"}))
+		Expect(ondisk.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were nulled by the partial save")
+		Expect(*ondisk.PIIDefaultDetectors).To(Equal([]string{"det-a"}))
+	})
+
+	// The MITM listener resolves its per-host PII detectors once at start
+	// (startMITMLocked → ResolvePIIPolicy), and the handler used to restart it
+	// only when mitm_listen changed. So an admin toggling a default detector
+	// (the Middleware detector table POSTs only pii_default_detectors) left
+	// cloud-proxy traffic unredacted until the next reboot. A
+	// pii_default_detectors change must now rebuild the listener.
+	It("rebuilds the MITM listener when only pii_default_detectors changes", func() {
+		rec := post(`{"mitm_listen":"127.0.0.1:0"}`)
+		Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
+		srv1 := app.MITMServer()
+		Expect(srv1).ToNot(BeNil(), "listener should be running after mitm_listen is set")
+
+		rec = post(`{"pii_default_detectors":["det-a"]}`)
+		Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String())
+		Expect(app.MITMServer()).ToNot(BeIdenticalTo(srv1),
+			"a default-detector change must restart the listener so it picks up the new detectors")
+	})
+
 	// Residual #9125: enabling the watchdog from a cold (off) state via the
 	// React master toggle must start the live watchdog immediately, without a
 	// restart. The toggle posts watchdog_idle_enabled/busy_enabled=true while
--- a/core/http/endpoints/openai/realtime_model.go
+++ b/core/http/endpoints/openai/realtime_model.go
@@ -432,7 +432,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
 	if pipeline.SoundDetection == "" {
 		return nil, nil
 	}
-	cfg, err := cl.LoadModelConfigFileByName(pipeline.SoundDetection, ml.ModelPath)
+	cfg, err := loadPipelineSubModel(cl, pipeline.SoundDetection, ml.ModelPath)
 	if err != nil {
 		return nil, fmt.Errorf("failed to load sound detection config: %w", err)
 	}
@@ -443,7 +443,7 @@ func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigL
 }

 func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) (Model, *config.ModelConfig, error) {
-	cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
+	cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
 	if err != nil {

 		return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -453,7 +453,7 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig
 		return nil, nil, fmt.Errorf("failed to validate config: %w", err)
 	}

-	cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
+	cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
 	if err != nil {

 		return nil, nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -542,11 +542,30 @@ func buildRealtimeRoutingContext(a *application.Application, sessionID string) *
 	}
 }

+// loadPipelineSubModel loads a pipeline sub-model config by name and follows a
+// single alias hop, so a pipeline that references an alias (e.g. `llm: default`)
+// gets the alias target's full config (Backend, Model, ...) rather than the
+// alias stub with an empty Backend. Without this the alias survives unresolved
+// into model loading and fails downstream — notably in distributed mode with
+// "backend name is empty". Mirrors the top-level alias resolution in
+// core/http/middleware/request.go.
+func loadPipelineSubModel(cl *config.ModelConfigLoader, name, modelPath string) (*config.ModelConfig, error) {
+	cfg, err := cl.LoadModelConfigFileByName(name, modelPath)
+	if err != nil {
+		return nil, err
+	}
+	resolved, _, err := cl.ResolveAlias(cfg)
+	if err != nil {
+		return nil, err
+	}
+	return resolved, nil
+}
+
 // returns and loads either a wrapped model or a model that support audio-to-audio
 func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, evaluator *templates.Evaluator, routing *RealtimeRoutingContext) (Model, error) {
 	xlog.Debug("Creating new model pipeline model", "pipeline", pipeline)

-	cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath)
+	cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath)
 	if err != nil {

 		return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -557,7 +576,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
 	}

 	// TODO: Do we always need a transcription model? It can be disabled. Note that any-to-any instruction following models don't transcribe as such, so if transcription is required it is a separate process
-	cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath)
+	cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath)
 	if err != nil {

 		return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -589,7 +608,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
 	xlog.Debug("Loading a wrapped model")

 	// Otherwise we want to return a wrapped model, which is a "virtual" model that re-uses other models to perform operations
-	cfgLLM, err := cl.LoadModelConfigFileByName(pipeline.LLM, ml.ModelPath)
+	cfgLLM, err := loadPipelineSubModel(cl, pipeline.LLM, ml.ModelPath)
 	if err != nil {

 		return nil, fmt.Errorf("failed to load backend config: %w", err)
@@ -604,7 +623,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model
 	applyPipelineReasoning(cfgLLM, *pipeline)
 	applyPipelineThinking(cfgLLM, *pipeline)

-	cfgTTS, err := cl.LoadModelConfigFileByName(pipeline.TTS, ml.ModelPath)
+	cfgTTS, err := loadPipelineSubModel(cl, pipeline.TTS, ml.ModelPath)
 	if err != nil {

 		return nil, fmt.Errorf("failed to load backend config: %w", err)
--- a/core/http/endpoints/openai/realtime_model_alias_test.go
+++ b/core/http/endpoints/openai/realtime_model_alias_test.go
@@ -0,0 +1,52 @@
+package openai
+
+import (
+	"os"
+	"path/filepath"
+
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+
+	"github.com/mudler/LocalAI/core/config"
+)
+
+// loadPipelineSubModel must resolve a pipeline sub-model that references an
+// alias (e.g. `llm: default`) one hop to the alias target's full config — so
+// the effective backend is the target's backend, not the empty backend of the
+// alias stub. This mirrors the top-level alias resolution done in
+// core/http/middleware/request.go, which the realtime pipeline previously
+// skipped (failing in distributed mode with "backend name is empty").
+var _ = Describe("loadPipelineSubModel", func() {
+	It("resolves a sub-model alias one hop to the target's config", func() {
+		tmpDir := GinkgoT().TempDir()
+
+		// A real model config with a concrete backend.
+		realLLM := `name: real-llm
+backend: llama-cpp
+parameters:
+  model: real-llm.gguf
+`
+		Expect(os.WriteFile(filepath.Join(tmpDir, "real-llm.yaml"), []byte(realLLM), 0644)).To(Succeed())
+
+		// An alias pointing at the real model.
+		aliasCfg := `name: default
+alias: real-llm
+`
+		Expect(os.WriteFile(filepath.Join(tmpDir, "default.yaml"), []byte(aliasCfg), 0644)).To(Succeed())
+
+		cl := config.NewModelConfigLoader(tmpDir)
+		Expect(cl.LoadModelConfigsFromPath(tmpDir)).To(Succeed())
+
+		// Resolving the alias must follow the hop to the target's full config.
+		resolved, err := loadPipelineSubModel(cl, "default", tmpDir)
+		Expect(err).NotTo(HaveOccurred())
+		Expect(resolved.IsAlias()).To(BeFalse())
+		Expect(resolved.Backend).To(Equal("llama-cpp"))
+
+		// A non-alias name must load unchanged.
+		direct, err := loadPipelineSubModel(cl, "real-llm", tmpDir)
+		Expect(err).NotTo(HaveOccurred())
+		Expect(direct.Backend).To(Equal("llama-cpp"))
+		Expect(direct.Name).To(Equal("real-llm"))
+	})
+})
--- a/core/http/middleware/baseurl.go
+++ b/core/http/middleware/baseurl.go
@@ -55,17 +55,70 @@ func BasePathPrefix(c echo.Context) string {
 // The returned URL is guaranteed to end with `/`.
 // The method should be used in conjunction with the StripPathPrefix middleware.
 func BaseURL(c echo.Context) string {
+	// An explicit external base URL (LOCALAI_BASE_URL) is authoritative for
+	// the origin. The proxy-derived path prefix is still appended so a
+	// reverse-proxy mount point keeps working. Trailing slashes are
+	// normalized via BasePathPrefix, which always starts and ends with "/".
+	if ext, ok := c.Get("_external_base_url").(string); ok && ext != "" {
+		return strings.TrimRight(ext, "/") + BasePathPrefix(c)
+	}
+
+	fwdProto, fwdHost := parseForwarded(c.Request().Header.Get("Forwarded"))
+
 	scheme := "http"
-	if c.Request().Header.Get("X-Forwarded-Proto") == "https" {
+	switch {
+	case c.Request().TLS != nil:
 		scheme = "https"
-	} else if c.Request().TLS != nil {
+	case strings.EqualFold(firstToken(c.Request().Header.Get("X-Forwarded-Proto")), "https"):
+		scheme = "https"
+	case strings.EqualFold(fwdProto, "https"):
 		scheme = "https"
 	}

 	host := c.Request().Host
 	if forwardedHost := c.Request().Header.Get("X-Forwarded-Host"); forwardedHost != "" {
 		host = forwardedHost
+	} else if fwdHost != "" {
+		host = fwdHost
 	}

 	return scheme + "://" + host + BasePathPrefix(c)
 }
+
+// firstToken returns the first comma-separated token of v, trimmed of spaces.
+// Reverse-proxy chains can emit X-Forwarded-Proto as "https,http"; only the
+// first hop (closest to the client) is meaningful for scheme detection.
+func firstToken(v string) string {
+	if i := strings.IndexByte(v, ','); i >= 0 {
+		v = v[:i]
+	}
+	return strings.TrimSpace(v)
+}
+
+// parseForwarded extracts the proto and host directives from the first element
+// of an RFC 7239 Forwarded header (e.g. `for=x;proto=https;host=h, for=y`).
+// Values may be quoted. Returns empty strings when absent or malformed so the
+// caller can fall through to other signals.
+func parseForwarded(header string) (proto, host string) {
+	if header == "" {
+		return "", ""
+	}
+	// Only the first element (closest proxy to the client) matters here.
+	if i := strings.IndexByte(header, ','); i >= 0 {
+		header = header[:i]
+	}
+	for _, directive := range strings.Split(header, ";") {
+		key, value, ok := strings.Cut(strings.TrimSpace(directive), "=")
+		if !ok {
+			continue
+		}
+		value = strings.Trim(strings.TrimSpace(value), `"`)
+		switch strings.ToLower(strings.TrimSpace(key)) {
+		case "proto":
+			proto = value
+		case "host":
+			host = value
+		}
+	}
+	return proto, host
+}
--- a/core/http/middleware/baseurl_test.go
+++ b/core/http/middleware/baseurl_test.go
@@ -135,4 +135,138 @@ var _ = Describe("BaseURL", func() {
 			Entry("missing leading slash", "evil"),
 		)
 	})
+
+	Context("scheme detection hardening", func() {
+		It("treats comma-separated X-Forwarded-Proto as https when first token is https", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/x", func(c echo.Context) error {
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/x", nil)
+			req.Header.Set("X-Forwarded-Proto", "https,http")
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("https://example.com/"))
+		})
+
+		It("derives https from the RFC 7239 Forwarded proto directive", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/x", func(c echo.Context) error {
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/x", nil)
+			req.Header.Set("Forwarded", "for=192.0.2.1;proto=https;host=proxy.example")
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("https://proxy.example/"))
+		})
+
+		It("prefers X-Forwarded-Host over the Forwarded host directive", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/x", func(c echo.Context) error {
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/x", nil)
+			req.Header.Set("X-Forwarded-Host", "xfh.example")
+			req.Header.Set("Forwarded", "host=fwd.example;proto=https")
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("https://xfh.example/"))
+		})
+	})
+
+	Context("explicit external base URL override", func() {
+		It("uses the configured origin over conflicting forwarded headers", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/x", func(c echo.Context) error {
+				c.Set("_external_base_url", "https://192.168.0.13:34567")
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/x", nil)
+			req.Header.Set("X-Forwarded-Proto", "http")
+			req.Header.Set("X-Forwarded-Host", "internal:8080")
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("https://192.168.0.13:34567/"))
+		})
+
+		It("combines the configured origin with a detected path prefix", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/hello", func(c echo.Context) error {
+				c.Set("_original_path", "/localai/hello")
+				c.Set("_external_base_url", "https://ext.example")
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/hello", nil)
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("https://ext.example/localai/"))
+		})
+
+		It("ignores an empty override", func() {
+			app := echo.New()
+			actualURL := ""
+			app.GET("/x", func(c echo.Context) error {
+				c.Set("_external_base_url", "")
+				actualURL = BaseURL(c)
+				return nil
+			})
+			req := httptest.NewRequest("GET", "/x", nil)
+			rec := httptest.NewRecorder()
+			app.ServeHTTP(rec, req)
+			Expect(actualURL).To(Equal("http://example.com/"))
+		})
+	})
+
+	Context("parseForwarded helper", func() {
+		It("parses unquoted proto and host", func() {
+			proto, host := parseForwarded("for=192.0.2.1;proto=https;host=h.example")
+			Expect(proto).To(Equal("https"))
+			Expect(host).To(Equal("h.example"))
+		})
+
+		It("strips quotes around values", func() {
+			proto, host := parseForwarded(`proto="https";host="h.example"`)
+			Expect(proto).To(Equal("https"))
+			Expect(host).To(Equal("h.example"))
+		})
+
+		It("uses only the first element of a multi-element header", func() {
+			proto, host := parseForwarded("proto=https;host=first.example, proto=http;host=second.example")
+			Expect(proto).To(Equal("https"))
+			Expect(host).To(Equal("first.example"))
+		})
+
+		It("returns empty strings for an empty header", func() {
+			proto, host := parseForwarded("")
+			Expect(proto).To(BeEmpty())
+			Expect(host).To(BeEmpty())
+		})
+
+		It("skips directives without a value", func() {
+			proto, host := parseForwarded("proto;host=h.example")
+			Expect(proto).To(BeEmpty())
+			Expect(host).To(Equal("h.example"))
+		})
+	})
+
+	Context("firstToken helper", func() {
+		It("returns the whole trimmed string when there is no comma", func() {
+			Expect(firstToken("  https  ")).To(Equal("https"))
+		})
+
+		It("returns the first trimmed token when there is a comma", func() {
+			Expect(firstToken("https , http")).To(Equal("https"))
+		})
+	})
 })
--- a/core/http/react-ui/e2e/role-mode-adaptive.spec.js
+++ b/core/http/react-ui/e2e/role-mode-adaptive.spec.js
@@ -1,100 +0,0 @@
-import { test, expect } from './coverage-fixtures.js'
-
-// These specs stub /api/features and /api/auth/status per cell. The test server
-// disables auth (isAdmin=true) and reports its own features, so we intercept
-// before navigation to simulate each role x mode cell.
-
-function stubFeatures(page, features) {
-  return page.route('**/api/features', route =>
-    route.fulfill({ contentType: 'application/json', body: JSON.stringify(features) }))
-}
-
-function stubNoP2P(page) {
-  // P2P token endpoint returns empty -> p2pEnabled=false.
-  return page.route('**/api/p2p/token', route =>
-    route.fulfill({ contentType: 'text/plain', body: '' }))
-}
-
-test.describe('Adaptive landing (HomeRoute)', () => {
-  test('admin + distributed redirects /app to Nodes', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app')
-    await expect(page).toHaveURL(/\/app\/nodes$/)
-    await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('admin + single-node stays on Home', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.goto('/app')
-    await expect(page).toHaveURL(/\/app$/)
-    await expect(page.locator('.home-greeting')).toBeVisible({ timeout: 15_000 })
-  })
-})
-
-test.describe('Adaptive sidebar', () => {
-  test('distributed pins the Cluster group with Nodes at the top', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat') // any in-app page so the sidebar is mounted
-    const pinned = page.locator('.sidebar-nav .sidebar-section-items').first()
-    await expect(pinned.getByText('Nodes', { exact: false })).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('single-node does not pin a Cluster group', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    // Nodes is reachable only via the Operate rail, not pinned at the top.
-    await expect(page.locator('.sidebar-nav')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.sidebar-nav .sidebar-section-items').first()
-      .getByText('Nodes', { exact: false })).toHaveCount(0)
-  })
-})
-
-test.describe('Top navbar', () => {
-  test('admin sees the mode pill and settings cog', async ({ page }) => {
-    await stubFeatures(page, { distributed: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__mode')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.top-navbar__icon[aria-label]')).not.toHaveCount(0)
-  })
-
-  test('admin-via-chat jump shows when localai_assistant is enabled', async ({ page }) => {
-    await stubFeatures(page, { distributed: false, localai_assistant: true })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__assistant')).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('admin-via-chat jump hidden when localai_assistant is off', async ({ page }) => {
-    await stubFeatures(page, { distributed: false, localai_assistant: false })
-    await stubNoP2P(page)
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__assistant')).toHaveCount(0)
-  })
-})
-
-test.describe('Token usage meter', () => {
-  test('renders when admin usage has data', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.route('**/api/auth/admin/usage**', route =>
-      route.fulfill({ contentType: 'application/json',
-        body: JSON.stringify({ buckets: [{ total_tokens: 1234 }] }) }))
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar__meter')).toBeVisible({ timeout: 15_000 })
-  })
-
-  test('hidden when admin usage is empty (graceful degrade)', async ({ page }) => {
-    await stubFeatures(page, { distributed: false })
-    await stubNoP2P(page)
-    await page.route('**/api/auth/admin/usage**', route =>
-      route.fulfill({ contentType: 'application/json', body: JSON.stringify({ buckets: [] }) }))
-    await page.goto('/app/chat')
-    await expect(page.locator('.top-navbar')).toBeVisible({ timeout: 15_000 })
-    await expect(page.locator('.top-navbar__meter')).toHaveCount(0)
-  })
-})
--- a/core/http/react-ui/public/locales/en/chat.json
+++ b/core/http/react-ui/public/locales/en/chat.json
@@ -86,6 +86,7 @@
  "input": {
    "placeholder": "Message...",
    "attachFile": "Attach file",
+    "send": "Send message",
    "stopGenerating": "Stop generating",
    "canvasTitle": "Canvas — extract code blocks and media into a side panel for preview, copy, and download",
    "canvasLabel": "Canvas",
--- a/core/http/react-ui/public/locales/en/home.json
+++ b/core/http/react-ui/public/locales/en/home.json
@@ -77,6 +77,21 @@
    "noModelsTitle": "No Models Available",
    "noModelsBody": "There are no models installed yet. Ask your administrator to set up models so you can start chatting."
  },
+  "starters": {
+    "title": "Recommended for your hardware",
+    "tier": {
+      "cpu": "CPU-only",
+      "gpu-small": "GPU",
+      "gpu-mid": "GPU",
+      "gpu-large": "GPU"
+    },
+    "cpuNote": "No GPU detected — these small models stay responsive on CPU.",
+    "gpuNote": "Picked to fit your available VRAM with room for context.",
+    "install": "Install",
+    "installing": "Installing",
+    "installStarted": "Installing {{model}}…",
+    "installFailed": "Install failed: {{message}}"
+  },
  "connect": {
    "title": "One endpoint, every API",
    "subtitle": "LocalAI serves its own full API — image & video generation, depth, object detection, reranking, audio, face & voice recognition, and realtime voice over WebRTC and WebSocket. On top of that, a drop-in compatibility layer lets any app built for OpenAI, Anthropic, Ollama or OpenAI Responses talk to it unchanged.",
--- a/core/http/react-ui/public/locales/en/models.json
+++ b/core/http/react-ui/public/locales/en/models.json
@@ -2,6 +2,16 @@
  "title": "Install Models",
  "subtitle": "Browse and install AI models from the gallery",
  "models": "Models",
+  "recommended": {
+    "title": "Recommended for your hardware",
+    "cpuNote": "No GPU detected - small models that stay responsive on CPU.",
+    "gpuNote": "Sized to fit your available VRAM with room for context.",
+    "install": "Install",
+    "installing": "Installing",
+    "installStarted": "Installing {{model}}…",
+    "installFailed": "Install failed: {{message}}",
+    "dismiss": "Dismiss recommendations"
+  },
  "stats": {
    "available": "Available",
    "installed": "Installed"
--- a/core/http/react-ui/public/locales/en/nav.json
+++ b/core/http/react-ui/public/locales/en/nav.json
@@ -12,16 +12,6 @@
  "accountSettings": "Account settings",
  "account": "Account",
  "accountFor": "Account: {{name}}",
-  "topbar": {
-    "label": "Top bar",
-    "modeDistributed": "Distributed",
-    "modeSwarm": "Swarm",
-    "modeSingle": "Single-node",
-    "pickModel": "Models",
-    "adminViaChat": "Admin via chat",
-    "tokensToday": "Tokens today",
-    "usageDetail": "View usage detail"
-  },
  "sections": {
    "create": "Create",
    "recognition": "Recognition",
--- a/core/http/react-ui/public/locales/id/admin.json
+++ b/core/http/react-ui/public/locales/id/admin.json
@@ -45,7 +45,7 @@
  },
  "scheduling": {
    "title": "Penjadwalan",
-    "subtitle": "Aturan penempatan model dan replika di seluruh klaster"
+    "subtitle": "Aturan penempatan model dan replika di seluruh kluster"
  },
  "p2p": {
    "title": "Komputasi AI Terdistribusi",
@@ -86,4 +86,4 @@
    "title": "Penjelajah",
    "subtitle": "Jelajahi file dan konfigurasi"
  }
-}
+}
--- a/core/http/react-ui/public/locales/id/chat.json
+++ b/core/http/react-ui/public/locales/id/chat.json
@@ -72,7 +72,7 @@
  "actions": {
    "copy": "Salin",
    "regenerate": "Hasilkan ulang",
-    "jumpToLatest": "Jump to latest"
+    "jumpToLatest": "Lompat ke terbaru"
  },
  "streaming": {
    "transferring": "Mentransfer model...",
@@ -115,4 +115,4 @@
    "clearAll": "Hapus semua",
    "deleteAllTitle": "Hapus semua percakapan"
  }
-}
+}
--- a/core/http/react-ui/public/locales/id/common.json
+++ b/core/http/react-ui/public/locales/id/common.json
@@ -1,8 +1,8 @@
 {
  "unsaved": {
-    "title": "Discard unsaved changes?",
-    "message": "You have unsaved changes that will be lost if you leave this page.",
-    "leave": "Leave"
+    "title": "Buang perubahan yang belum disimpan?",
+    "message": "Anda memiliki perubahan yang belum disimpan. Perubahan tersebut akan hilang jika Anda meninggalkan halaman ini.",
+    "leave": "Tinggalkan Halaman"
  },
  "actions": {
    "save": "Simpan",
--- a/core/http/react-ui/public/locales/id/home.json
+++ b/core/http/react-ui/public/locales/id/home.json
@@ -7,15 +7,15 @@
  "resourceGpu": "GPU",
  "resourceRam": "RAM",
  "greeting": {
-    "morning": "Good morning",
-    "afternoon": "Good afternoon",
-    "evening": "Good evening",
-    "night": "Working late"
+    "morning": "Selamat pagi",
+    "afternoon": "Selamat siang",
+    "evening": "Selamat malam",
+    "night": "Selamat lembur"
  },
  "statusLine": {
-    "modelsLoaded_one": "{{count}} model loaded",
-    "modelsLoaded_other": "{{count}} models loaded",
-    "noModelsLoaded": "No models loaded",
+    "modelsLoaded_one": "{{count}} model dimuat",
+    "modelsLoaded_other": "{{count}} model dimuat",
+    "noModelsLoaded": "Tidak ada model yang dimuat",
    "nodes_one": "{{count}} node",
    "nodes_other": "{{count}} nodes"
  },
@@ -79,14 +79,14 @@
  },
  "connect": {
    "title": "Satu endpoint, semua API",
-    "subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Di atas itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
+    "subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Selain itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.",
    "nativeTitle": "API native",
    "compatTitle": "Kompatibilitas drop-in",
    "apiReference": "Referensi API lengkap",
    "copy": "Salin",
    "copied": "Disalin",
-    "browse": "Browse the API",
-    "hide": "Hide endpoints",
-    "dismiss": "Dismiss"
+    "browse": "Jelajahi API",
+    "hide": "Sembunyikan endpoint",
+    "dismiss": "Abaikan"
  }
 }
--- a/Show More
+++ b/Show More