step-flash fixes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
fix: update moonshine API, add setuptools to voxcpm requirements (#8541 )
2026-02-13 08:03:18 -05:00 · 2026-02-12 23:36:16 +01:00 · 2026-02-12 23:22:37 +01:00 · 2026-02-12 23:21:53 +01:00 · 2026-02-12 21:54:33 +01:00 · 2026-02-12 19:33:46 +00:00
59 changed files with 2213 additions and 158 deletions
--- a/.env
+++ b/.env
@@ -26,6 +26,9 @@
 ## Disables COMPEL (Diffusers)
 # COMPEL=0

+## Disables SD_EMBED (Diffusers)
+# SD_EMBED=0
+
 ## Enable/Disable single backend (useful if only one GPU is available)
 # LOCALAI_SINGLE_ACTIVE_BACKEND=true

--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -1674,6 +1674,20 @@ jobs:
            dockerfile: "./backend/Dockerfile.golang"
            context: "./"
            ubuntu-version: '2404'
+          # voxtral
+          - build-type: ''
+            cuda-major-version: ""
+            cuda-minor-version: ""
+            platforms: 'linux/amd64,linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-cpu-voxtral'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "voxtral"
+            dockerfile: "./backend/Dockerfile.golang"
+            context: "./"
+            ubuntu-version: '2404'
          #silero-vad
          - build-type: ''
            cuda-major-version: ""
@@ -1945,6 +1959,10 @@ jobs:
            tag-suffix: "-metal-darwin-arm64-whisper"
            build-type: "metal"
            lang: "go"
+          - backend: "voxtral"
+            tag-suffix: "-metal-darwin-arm64-voxtral"
+            build-type: "metal"
+            lang: "go"
          - backend: "vibevoice"
            tag-suffix: "-metal-darwin-arm64-vibevoice"
            build-type: "mps"
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -30,6 +30,10 @@ jobs:
            variable: "PIPER_VERSION"
            branch: "master"
            file: "backend/go/piper/Makefile"
+          - repository: "antirez/voxtral.c"
+            variable: "VOXTRAL_VERSION"
+            branch: "main"
+            file: "backend/go/voxtral/Makefile"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -361,3 +361,34 @@ jobs:
        run: |
          make --jobs=5 --output-sync=target -C backend/python/voxcpm
          make --jobs=5 --output-sync=target -C backend/python/voxcpm test
+  tests-voxtral:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
+      - name: Setup Go
+        uses: actions/setup-go@v5
+      # You can test your matrix by printing the current Go version
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Build voxtral
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/voxtral
+      - name: Test voxtral
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/voxtral test
--- a/6
+++ b/6
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral

 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -453,6 +453,7 @@ BACKEND_HUGGINGFACE = huggingface|golang|.|false|true
 BACKEND_SILERO_VAD = silero-vad|golang|.|false|true
 BACKEND_STABLEDIFFUSION_GGML = stablediffusion-ggml|golang|.|--progress=plain|true
 BACKEND_WHISPER = whisper|golang|.|false|true
+BACKEND_VOXTRAL = voxtral|golang|.|false|true

 # Python backends with root context
 BACKEND_RERANKERS = rerankers|python|.|false|true
@@ -506,6 +507,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
 $(eval $(call generate-docker-build-target,$(BACKEND_SILERO_VAD)))
 $(eval $(call generate-docker-build-target,$(BACKEND_STABLEDIFFUSION_GGML)))
 $(eval $(call generate-docker-build-target,$(BACKEND_WHISPER)))
+$(eval $(call generate-docker-build-target,$(BACKEND_VOXTRAL)))
 $(eval $(call generate-docker-build-target,$(BACKEND_RERANKERS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_TRANSFORMERS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_OUTETTS)))
@@ -533,7 +535,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
 docker-save-%: backend-images
 	docker save local-ai-backend:$* -o backend-images/$*.tar

-docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step
+docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral

 ########################################################
 ### Mock Backend for E2E Tests
--- a/backend/Dockerfile.golang
+++ b/backend/Dockerfile.golang
@@ -20,7 +20,7 @@ RUN apt-get update && \
        build-essential \
        git ccache \
        ca-certificates \
-        make cmake wget \
+        make cmake wget libopenblas-dev \
        curl unzip \
        libssl-dev && \
    apt-get clean && \
--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=8872ad2125336d209a9911a82101f80095a9831d
+LLAMA_VERSION?=338085c69e486b7155e5b03d7b5087e02c0e2528
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/cpp/llama-cpp/grpc-server.cpp
+++ b/backend/cpp/llama-cpp/grpc-server.cpp
@@ -294,6 +294,76 @@ json parse_options(bool streaming, const backend::PredictOptions* predict, const
    return data;
 }

+static bool template_uses_arguments_items_filter(const std::string & template_src) {
+    return template_src.find("arguments|items") != std::string::npos ||
+           template_src.find("arguments | items") != std::string::npos ||
+           template_src.find("arguments| items") != std::string::npos ||
+           template_src.find("arguments |items") != std::string::npos;
+}
+
+static void normalize_tool_call_arguments_for_template(
+    json & messages,
+    const std::string & template_src,
+    const char * request_name)
+{
+    if (!messages.is_array() || !template_uses_arguments_items_filter(template_src)) {
+        return;
+    }
+
+    size_t converted = 0;
+    size_t failed = 0;
+
+    for (auto & message : messages) {
+        if (!message.is_object() || !message.contains("tool_calls") || !message["tool_calls"].is_array()) {
+            continue;
+        }
+
+        for (auto & tool_call : message["tool_calls"]) {
+            if (!tool_call.is_object() || !tool_call.contains("function") || !tool_call["function"].is_object()) {
+                continue;
+            }
+
+            auto & function = tool_call["function"];
+            if (!function.contains("arguments")) {
+                continue;
+            }
+
+            auto & arguments = function["arguments"];
+            if (!arguments.is_string()) {
+                continue;
+            }
+
+            const std::string args_str = arguments.get<std::string>();
+            if (args_str.empty()) {
+                arguments = json::object();
+                converted++;
+                continue;
+            }
+
+            try {
+                json parsed_args = json::parse(args_str);
+                if (parsed_args.is_object()) {
+                    arguments = parsed_args;
+                    converted++;
+                }
+            } catch (const json::parse_error &) {
+                failed++;
+            }
+        }
+    }
+
+    if (converted > 0) {
+        SRV_INF("[TOOLS DEBUG] %s: Converted %zu tool call argument strings to JSON objects for arguments|items template compatibility\n",
+                request_name,
+                converted);
+    }
+    if (failed > 0) {
+        SRV_WRN("[TOOLS DEBUG] %s: Failed to parse %zu tool call argument strings as JSON for arguments|items template compatibility\n",
+                request_name,
+                failed);
+    }
+}
+

 const std::vector<ggml_type> kv_cache_types = {
    GGML_TYPE_F32,
@@ -1255,6 +1325,11 @@ public:
                    body_json["add_generation_prompt"] = data["add_generation_prompt"];
                }

+                if (body_json.contains("messages") && ctx_server.impl->chat_params.tmpls) {
+                    const auto template_src = common_chat_templates_source(ctx_server.impl->chat_params.tmpls.get());
+                    normalize_tool_call_arguments_for_template(body_json["messages"], template_src, "PredictStream");
+                }
+
                // Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
                SRV_DBG("[CONVERSATION DEBUG] PredictStream: Full body_json before oaicompat_chat_params_parse:\n%s\n", body_json.dump(2).c_str());

@@ -1986,6 +2061,11 @@ public:
                    body_json["add_generation_prompt"] = data["add_generation_prompt"];
                }

+                if (body_json.contains("messages") && ctx_server.impl->chat_params.tmpls) {
+                    const auto template_src = common_chat_templates_source(ctx_server.impl->chat_params.tmpls.get());
+                    normalize_tool_call_arguments_for_template(body_json["messages"], template_src, "Predict");
+                }
+
                // Debug: Print full body_json before template processing (includes messages, tools, tool_choice, etc.)
                SRV_DBG("[CONVERSATION DEBUG] Predict: Full body_json before oaicompat_chat_params_parse:\n%s\n", body_json.dump(2).c_str());

--- a/backend/go/stablediffusion-ggml/.gitignore
+++ b/backend/go/stablediffusion-ggml/.gitignore
@@ -2,5 +2,5 @@ package/
 sources/
 .cache/
 build/
-libgosd.so
+*.so
 stablediffusion-ggml
--- a/backend/go/stablediffusion-ggml/Makefile
+++ b/backend/go/stablediffusion-ggml/Makefile
@@ -66,15 +66,18 @@ sources/stablediffusion-ggml.cpp:
 	git checkout $(STABLEDIFFUSION_GGML_VERSION) && \
 	git submodule update --init --recursive --depth 1 --single-branch

-libgosd.so: sources/stablediffusion-ggml.cpp CMakeLists.txt gosd.cpp gosd.h
-	mkdir -p build && \
-	cd build && \
-	cmake .. $(CMAKE_ARGS) && \
-	cmake --build . --config Release -j$(JOBS) && \
-	cd .. && \
-	mv build/libgosd.so ./
+# Detect OS
+UNAME_S := $(shell uname -s)

-stablediffusion-ggml: main.go gosd.go libgosd.so
+# Only build CPU variants on Linux
+ifeq ($(UNAME_S),Linux)
+	VARIANT_TARGETS = libgosd-avx.so libgosd-avx2.so libgosd-avx512.so libgosd-fallback.so
+else
+	# On non-Linux (e.g., Darwin), build only fallback variant
+	VARIANT_TARGETS = libgosd-fallback.so
+endif
+
+stablediffusion-ggml: main.go gosd.go $(VARIANT_TARGETS)
 	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o stablediffusion-ggml ./

 package: stablediffusion-ggml
@@ -82,5 +85,46 @@ package: stablediffusion-ggml

 build: package

-clean:
-	rm -rf libgosd.so build stablediffusion-ggml package sources
+clean: purge
+	rm -rf libgosd*.so stablediffusion-ggml package sources
+
+purge:
+	rm -rf build*
+
+# Build all variants (Linux only)
+ifeq ($(UNAME_S),Linux)
+libgosd-avx.so: sources/stablediffusion-ggml.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I stablediffusion-ggml build info:avx${RESET})
+	SO_TARGET=libgosd-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosd-custom
+	rm -rfv build*
+
+libgosd-avx2.so: sources/stablediffusion-ggml.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I stablediffusion-ggml build info:avx2${RESET})
+	SO_TARGET=libgosd-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgosd-custom
+	rm -rfv build*
+
+libgosd-avx512.so: sources/stablediffusion-ggml.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I stablediffusion-ggml build info:avx512${RESET})
+	SO_TARGET=libgosd-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgosd-custom
+	rm -rfv build*
+endif
+
+# Build fallback variant (all platforms)
+libgosd-fallback.so: sources/stablediffusion-ggml.cpp
+	$(MAKE) purge
+	$(info ${GREEN}I stablediffusion-ggml build info:fallback${RESET})
+	SO_TARGET=libgosd-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosd-custom
+	rm -rfv build*
+
+libgosd-custom: CMakeLists.txt gosd.cpp gosd.h
+	mkdir -p build-$(SO_TARGET) && \
+	cd build-$(SO_TARGET) && \
+	cmake .. $(CMAKE_ARGS) && \
+	cmake --build . --config Release -j$(JOBS) && \
+	cd .. && \
+	mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET)
+
+all: stablediffusion-ggml package
--- a/backend/go/stablediffusion-ggml/main.go
+++ b/backend/go/stablediffusion-ggml/main.go
@@ -2,6 +2,7 @@ package main

 import (
 	"flag"
+	"os"

 	"github.com/ebitengine/purego"
 	grpc "github.com/mudler/LocalAI/pkg/grpc"
@@ -17,7 +18,13 @@ type LibFuncs struct {
 }

 func main() {
-	gosd, err := purego.Dlopen("./libgosd.so", purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	// Get library name from environment variable, default to fallback
+	libName := os.Getenv("SD_LIBRARY")
+	if libName == "" {
+		libName = "./libgosd-fallback.so"
+	}
+
+	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
 	if err != nil {
 		panic(err)
 	}
--- a/backend/go/stablediffusion-ggml/package.sh
+++ b/backend/go/stablediffusion-ggml/package.sh
@@ -11,7 +11,7 @@ REPO_ROOT="${CURDIR}/../../.."
 # Create lib directory
 mkdir -p $CURDIR/package/lib

-cp -avf $CURDIR/libgosd.so $CURDIR/package/
+cp -avf $CURDIR/libgosd-*.so $CURDIR/package/
 cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/
 cp -fv $CURDIR/run.sh $CURDIR/package/

--- a/backend/go/stablediffusion-ggml/run.sh
+++ b/backend/go/stablediffusion-ggml/run.sh
@@ -1,14 +1,52 @@
 #!/bin/bash
 set -ex

+# Get the absolute current dir where the script is located
 CURDIR=$(dirname "$(realpath $0)")

+cd /
+
+echo "CPU info:"
+if [ "$(uname)" != "Darwin" ]; then
+	grep -e "model\sname" /proc/cpuinfo | head -1
+	grep -e "flags" /proc/cpuinfo | head -1
+fi
+
+LIBRARY="$CURDIR/libgosd-fallback.so"
+
+if [ "$(uname)" != "Darwin" ]; then
+	if grep -q -e "\savx\s" /proc/cpuinfo ; then
+		echo "CPU:    AVX    found OK"
+		if [ -e $CURDIR/libgosd-avx.so ]; then
+			LIBRARY="$CURDIR/libgosd-avx.so"
+		fi
+	fi
+
+	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
+		echo "CPU:    AVX2   found OK"
+		if [ -e $CURDIR/libgosd-avx2.so ]; then
+			LIBRARY="$CURDIR/libgosd-avx2.so"
+		fi
+	fi
+
+	# Check avx 512
+	if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
+		echo "CPU:    AVX512F found OK"
+		if [ -e $CURDIR/libgosd-avx512.so ]; then
+			LIBRARY="$CURDIR/libgosd-avx512.so"
+		fi
+	fi
+fi
+
 export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+export SD_LIBRARY=$LIBRARY

 # If there is a lib/ld.so, use it
 if [ -f $CURDIR/lib/ld.so ]; then
 	echo "Using lib/ld.so"
+	echo "Using library: $LIBRARY"
 	exec $CURDIR/lib/ld.so $CURDIR/stablediffusion-ggml "$@"
 fi

-exec $CURDIR/stablediffusion-ggml "$@"
+echo "Using library: $LIBRARY"
+exec $CURDIR/stablediffusion-ggml "$@"
--- a/backend/go/voxtral/.gitignore
+++ b/backend/go/voxtral/.gitignore
@@ -0,0 +1,9 @@
+.cache/
+sources/
+build/
+build-*/
+package/
+voxtral
+*.so
+*.dylib
+compile_commands.json
--- a/backend/go/voxtral/CMakeLists.txt
+++ b/backend/go/voxtral/CMakeLists.txt
@@ -0,0 +1,84 @@
+cmake_minimum_required(VERSION 3.12)
+
+if(USE_METAL)
+    project(govoxtral LANGUAGES C OBJC)
+else()
+    project(govoxtral LANGUAGES C)
+endif()
+
+set(CMAKE_POSITION_INDEPENDENT_CODE ON)
+set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+
+# Workaround: CMake + GCC linker depfile generation fails for MODULE libraries
+set(CMAKE_C_LINKER_DEPFILE_SUPPORTED FALSE)
+
+# Build voxtral.c as a library
+set(VOXTRAL_SOURCES
+    sources/voxtral.c/voxtral.c
+    sources/voxtral.c/voxtral_kernels.c
+    sources/voxtral.c/voxtral_audio.c
+    sources/voxtral.c/voxtral_encoder.c
+    sources/voxtral.c/voxtral_decoder.c
+    sources/voxtral.c/voxtral_tokenizer.c
+    sources/voxtral.c/voxtral_safetensors.c
+)
+
+# Metal GPU acceleration (macOS arm64 only)
+if(USE_METAL)
+    # Generate embedded shader header from .metal source via xxd
+    add_custom_command(
+        OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c/voxtral_shaders_source.h
+        COMMAND xxd -i voxtral_shaders.metal > voxtral_shaders_source.h
+        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c
+        DEPENDS sources/voxtral.c/voxtral_shaders.metal
+        COMMENT "Generating embedded Metal shaders header"
+    )
+    list(APPEND VOXTRAL_SOURCES sources/voxtral.c/voxtral_metal.m)
+    set_source_files_properties(sources/voxtral.c/voxtral_metal.m PROPERTIES
+        COMPILE_FLAGS "-fobjc-arc"
+    )
+endif()
+
+add_library(govoxtral MODULE csrc/govoxtral.c ${VOXTRAL_SOURCES})
+
+target_include_directories(govoxtral PRIVATE sources/voxtral.c csrc)
+
+target_compile_options(govoxtral PRIVATE -O3 -ffast-math)
+
+if(USE_METAL)
+    target_compile_definitions(govoxtral PRIVATE USE_BLAS USE_METAL ACCELERATE_NEW_LAPACK)
+    target_link_libraries(govoxtral PRIVATE
+        "-framework Accelerate"
+        "-framework Metal"
+        "-framework MetalPerformanceShaders"
+        "-framework MetalPerformanceShadersGraph"
+        "-framework Foundation"
+        "-framework AudioToolbox"
+        "-framework CoreFoundation"
+        m
+    )
+    # Ensure the generated shader header is built before compiling
+    target_sources(govoxtral PRIVATE
+        ${CMAKE_CURRENT_SOURCE_DIR}/sources/voxtral.c/voxtral_shaders_source.h
+    )
+elseif(USE_OPENBLAS)
+    # Try to find OpenBLAS; use it if available, otherwise fall back to pure C
+    find_package(BLAS)
+    if(BLAS_FOUND)
+        target_compile_definitions(govoxtral PRIVATE USE_BLAS USE_OPENBLAS)
+        target_link_libraries(govoxtral PRIVATE ${BLAS_LIBRARIES} m)
+        target_include_directories(govoxtral PRIVATE /usr/include/openblas)
+    else()
+        message(WARNING "OpenBLAS requested but not found, building without BLAS")
+        target_link_libraries(govoxtral PRIVATE m)
+    endif()
+elseif(APPLE)
+    # macOS without Metal: use Accelerate framework
+    target_compile_definitions(govoxtral PRIVATE USE_BLAS ACCELERATE_NEW_LAPACK)
+    target_link_libraries(govoxtral PRIVATE "-framework Accelerate" m)
+else()
+    target_link_libraries(govoxtral PRIVATE m)
+endif()
+
+set_property(TARGET govoxtral PROPERTY C_STANDARD 11)
+set_target_properties(govoxtral PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR})
--- a/backend/go/voxtral/Makefile
+++ b/backend/go/voxtral/Makefile
@@ -0,0 +1,107 @@
+.NOTPARALLEL:
+
+CMAKE_ARGS?=
+BUILD_TYPE?=
+NATIVE?=true
+
+GOCMD?=go
+GO_TAGS?=
+JOBS?=$(shell nproc --ignore=1 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
+
+# voxtral.c version
+VOXTRAL_REPO?=https://github.com/antirez/voxtral.c
+VOXTRAL_VERSION?=c9e8773a2042d67c637fc492c8a655c485354080
+
+# Detect OS
+UNAME_S := $(shell uname -s)
+
+# Shared library extension
+ifeq ($(UNAME_S),Darwin)
+	SO_EXT=dylib
+else
+	SO_EXT=so
+endif
+
+SO_TARGET?=libgovoxtral.$(SO_EXT)
+
+CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
+
+ifeq ($(NATIVE),false)
+ifneq ($(UNAME_S),Darwin)
+	CMAKE_ARGS+=-DCMAKE_C_FLAGS="-march=x86-64"
+endif
+endif
+
+ifeq ($(BUILD_TYPE),cublas)
+	CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
+else ifeq ($(BUILD_TYPE),hipblas)
+	CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
+else ifeq ($(BUILD_TYPE),metal)
+	CMAKE_ARGS+=-DUSE_OPENBLAS=OFF -DUSE_METAL=ON
+else ifeq ($(UNAME_S),Darwin)
+	# Default on macOS: use Accelerate (no OpenBLAS needed)
+	CMAKE_ARGS+=-DUSE_OPENBLAS=OFF
+else
+	CMAKE_ARGS+=-DUSE_OPENBLAS=ON
+endif
+
+# Single library target
+ifeq ($(UNAME_S),Darwin)
+VARIANT_TARGETS = libgovoxtral.dylib
+else
+VARIANT_TARGETS = libgovoxtral.so
+endif
+
+sources/voxtral.c:
+	mkdir -p sources/voxtral.c
+	cd sources/voxtral.c && \
+	git init && \
+	git remote add origin $(VOXTRAL_REPO) && \
+	git fetch origin && \
+	git checkout $(VOXTRAL_VERSION) && \
+	git submodule update --init --recursive --depth 1 --single-branch
+
+voxtral: main.go govoxtral.go $(VARIANT_TARGETS)
+	CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o voxtral ./
+
+package: voxtral
+	bash package.sh
+
+build: package
+
+clean: purge
+	rm -rf libgovoxtral.so libgovoxtral.dylib package sources/voxtral.c voxtral
+
+purge:
+	rm -rf build*
+
+# Build single library
+ifeq ($(UNAME_S),Darwin)
+libgovoxtral.dylib: sources/voxtral.c
+	$(MAKE) purge
+	$(info Building voxtral: darwin)
+	SO_TARGET=libgovoxtral.dylib NATIVE=true $(MAKE) libgovoxtral-custom
+	rm -rfv build*
+else
+libgovoxtral.so: sources/voxtral.c
+	$(MAKE) purge
+	$(info Building voxtral)
+	SO_TARGET=libgovoxtral.so $(MAKE) libgovoxtral-custom
+	rm -rfv build*
+endif
+
+libgovoxtral-custom: CMakeLists.txt csrc/govoxtral.c csrc/govoxtral.h
+	mkdir -p build-$(SO_TARGET) && \
+	cd build-$(SO_TARGET) && \
+	cmake .. $(CMAKE_ARGS) && \
+	cmake --build . --config Release -j$(JOBS) && \
+	cd .. && \
+	(mv build-$(SO_TARGET)/libgovoxtral.so ./$(SO_TARGET) 2>/dev/null || \
+	 mv build-$(SO_TARGET)/libgovoxtral.dylib ./$(SO_TARGET) 2>/dev/null)
+
+test: voxtral
+	@echo "Running voxtral tests..."
+	bash test.sh
+	@echo "voxtral tests completed."
+
+all: voxtral package
--- a/backend/go/voxtral/csrc/govoxtral.c
+++ b/backend/go/voxtral/csrc/govoxtral.c
@@ -0,0 +1,62 @@
+#include "govoxtral.h"
+#include "voxtral.h"
+#include "voxtral_audio.h"
+#ifdef USE_METAL
+#include "voxtral_metal.h"
+#endif
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+static vox_ctx_t *ctx = NULL;
+static char *last_result = NULL;
+static int metal_initialized = 0;
+
+int load_model(const char *model_dir) {
+    if (ctx != NULL) {
+        vox_free(ctx);
+        ctx = NULL;
+    }
+
+#ifdef USE_METAL
+    if (!metal_initialized) {
+        vox_metal_init();
+        metal_initialized = 1;
+    }
+#endif
+
+    ctx = vox_load(model_dir);
+    if (ctx == NULL) {
+        fprintf(stderr, "error: failed to load voxtral model from %s\n", model_dir);
+        return 1;
+    }
+
+    return 0;
+}
+
+const char *transcribe(const char *wav_path) {
+    if (ctx == NULL) {
+        fprintf(stderr, "error: model not loaded\n");
+        return "";
+    }
+
+    if (last_result != NULL) {
+        free(last_result);
+        last_result = NULL;
+    }
+
+    last_result = vox_transcribe(ctx, wav_path);
+    if (last_result == NULL) {
+        fprintf(stderr, "error: transcription failed for %s\n", wav_path);
+        return "";
+    }
+
+    return last_result;
+}
+
+void free_result(void) {
+    if (last_result != NULL) {
+        free(last_result);
+        last_result = NULL;
+    }
+}
--- a/backend/go/voxtral/csrc/govoxtral.h
+++ b/backend/go/voxtral/csrc/govoxtral.h
@@ -0,0 +1,8 @@
+#ifndef GOVOXTRAL_H
+#define GOVOXTRAL_H
+
+extern int load_model(const char *model_dir);
+extern const char *transcribe(const char *wav_path);
+extern void free_result(void);
+
+#endif /* GOVOXTRAL_H */
--- a/backend/go/voxtral/govoxtral.go
+++ b/backend/go/voxtral/govoxtral.go
@@ -0,0 +1,60 @@
+package main
+
+import (
+	"fmt"
+	"os"
+	"strings"
+
+	"github.com/mudler/LocalAI/pkg/grpc/base"
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"github.com/mudler/LocalAI/pkg/utils"
+)
+
+var (
+	CppLoadModel  func(modelDir string) int
+	CppTranscribe func(wavPath string) string
+	CppFreeResult func()
+)
+
+type Voxtral struct {
+	base.SingleThread
+}
+
+func (v *Voxtral) Load(opts *pb.ModelOptions) error {
+	if ret := CppLoadModel(opts.ModelFile); ret != 0 {
+		return fmt.Errorf("failed to load Voxtral model from %s", opts.ModelFile)
+	}
+	return nil
+}
+
+func (v *Voxtral) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptResult, error) {
+	dir, err := os.MkdirTemp("", "voxtral")
+	if err != nil {
+		return pb.TranscriptResult{}, err
+	}
+	defer os.RemoveAll(dir)
+
+	convertedPath := dir + "/converted.wav"
+
+	if err := utils.AudioToWav(opts.Dst, convertedPath); err != nil {
+		return pb.TranscriptResult{}, err
+	}
+
+	result := strings.Clone(CppTranscribe(convertedPath))
+	CppFreeResult()
+
+	text := strings.TrimSpace(result)
+
+	segments := []*pb.TranscriptSegment{}
+	if text != "" {
+		segments = append(segments, &pb.TranscriptSegment{
+			Id:   0,
+			Text: text,
+		})
+	}
+
+	return pb.TranscriptResult{
+		Segments: segments,
+		Text:     text,
+	}, nil
+}
--- a/backend/go/voxtral/main.go
+++ b/backend/go/voxtral/main.go
@@ -0,0 +1,53 @@
+package main
+
+// Note: this is started internally by LocalAI and a server is allocated for each model
+import (
+	"flag"
+	"os"
+	"runtime"
+
+	"github.com/ebitengine/purego"
+	grpc "github.com/mudler/LocalAI/pkg/grpc"
+)
+
+var (
+	addr = flag.String("addr", "localhost:50051", "the address to connect to")
+)
+
+type LibFuncs struct {
+	FuncPtr any
+	Name    string
+}
+
+func main() {
+	// Get library name from environment variable, default to fallback
+	libName := os.Getenv("VOXTRAL_LIBRARY")
+	if libName == "" {
+		if runtime.GOOS == "darwin" {
+			libName = "./libgovoxtral.dylib"
+		} else {
+			libName = "./libgovoxtral.so"
+		}
+	}
+
+	gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL)
+	if err != nil {
+		panic(err)
+	}
+
+	libFuncs := []LibFuncs{
+		{&CppLoadModel, "load_model"},
+		{&CppTranscribe, "transcribe"},
+		{&CppFreeResult, "free_result"},
+	}
+
+	for _, lf := range libFuncs {
+		purego.RegisterLibFunc(lf.FuncPtr, gosd, lf.Name)
+	}
+
+	flag.Parse()
+
+	if err := grpc.StartServer(*addr, &Voxtral{}); err != nil {
+		panic(err)
+	}
+}
--- a/backend/go/voxtral/package.sh
+++ b/backend/go/voxtral/package.sh
@@ -0,0 +1,68 @@
+#!/bin/bash
+
+# Script to copy the appropriate libraries based on architecture
+
+set -e
+
+CURDIR=$(dirname "$(realpath $0)")
+REPO_ROOT="${CURDIR}/../../.."
+
+# Create lib directory
+mkdir -p $CURDIR/package/lib
+
+cp -avf $CURDIR/voxtral $CURDIR/package/
+cp -fv $CURDIR/libgovoxtral-*.so $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/libgovoxtral-*.dylib $CURDIR/package/ 2>/dev/null || true
+cp -fv $CURDIR/run.sh $CURDIR/package/
+
+# Detect architecture and copy appropriate libraries
+if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
+    # x86_64 architecture
+    echo "Detected x86_64 architecture, copying x86_64 libraries..."
+    cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
+    cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
+    cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
+    cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
+    cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
+    cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
+    cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
+    cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
+    cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
+    # OpenBLAS if available
+    if [ -f /usr/lib/x86_64-linux-gnu/libopenblas.so.0 ]; then
+        cp -arfLv /usr/lib/x86_64-linux-gnu/libopenblas.so.0 $CURDIR/package/lib/
+    fi
+elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
+    # ARM64 architecture
+    echo "Detected ARM64 architecture, copying ARM64 libraries..."
+    cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
+    cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
+    cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
+    cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
+    cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
+    cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
+    cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
+    cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
+    cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
+    # OpenBLAS if available
+    if [ -f /usr/lib/aarch64-linux-gnu/libopenblas.so.0 ]; then
+        cp -arfLv /usr/lib/aarch64-linux-gnu/libopenblas.so.0 $CURDIR/package/lib/
+    fi
+elif [ $(uname -s) = "Darwin" ]; then
+    echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed"
+else
+    echo "Error: Could not detect architecture"
+    exit 1
+fi
+
+# Package GPU libraries based on BUILD_TYPE
+GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
+if [ -f "$GPU_LIB_SCRIPT" ]; then
+    echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
+    source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
+    package_gpu_libs
+fi
+
+echo "Packaging completed successfully"
+ls -liah $CURDIR/package/
+ls -liah $CURDIR/package/lib/
--- a/backend/go/voxtral/run.sh
+++ b/backend/go/voxtral/run.sh
@@ -0,0 +1,49 @@
+#!/bin/bash
+set -ex
+
+# Get the absolute current dir where the script is located
+CURDIR=$(dirname "$(realpath $0)")
+
+cd /
+
+echo "CPU info:"
+if [ "$(uname)" != "Darwin" ]; then
+	grep -e "model\sname" /proc/cpuinfo | head -1
+	grep -e "flags" /proc/cpuinfo | head -1
+fi
+
+if [ "$(uname)" = "Darwin" ]; then
+	# macOS: single dylib variant (Metal or Accelerate)
+	LIBRARY="$CURDIR/libgovoxtral-fallback.dylib"
+	export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
+else
+	LIBRARY="$CURDIR/libgovoxtral-fallback.so"
+
+	if grep -q -e "\savx\s" /proc/cpuinfo ; then
+		echo "CPU:    AVX    found OK"
+		if [ -e $CURDIR/libgovoxtral-avx.so ]; then
+			LIBRARY="$CURDIR/libgovoxtral-avx.so"
+		fi
+	fi
+
+	if grep -q -e "\savx2\s" /proc/cpuinfo ; then
+		echo "CPU:    AVX2   found OK"
+		if [ -e $CURDIR/libgovoxtral-avx2.so ]; then
+			LIBRARY="$CURDIR/libgovoxtral-avx2.so"
+		fi
+	fi
+
+	export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
+fi
+
+export VOXTRAL_LIBRARY=$LIBRARY
+
+# If there is a lib/ld.so, use it (Linux only)
+if [ -f $CURDIR/lib/ld.so ]; then
+	echo "Using lib/ld.so"
+	echo "Using library: $LIBRARY"
+	exec $CURDIR/lib/ld.so $CURDIR/voxtral "$@"
+fi
+
+echo "Using library: $LIBRARY"
+exec $CURDIR/voxtral "$@"
--- a/backend/go/voxtral/test.sh
+++ b/backend/go/voxtral/test.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+set -e
+
+CURDIR=$(dirname "$(realpath $0)")
+
+echo "Running voxtral backend tests..."
+
+# The test requires:
+#   - VOXTRAL_MODEL_DIR: path to directory containing consolidated.safetensors + tekken.json
+#   - VOXTRAL_BINARY: path to the voxtral binary (defaults to ./voxtral)
+#
+# Tests that require the model will be skipped if VOXTRAL_MODEL_DIR is not set.
+
+cd "$CURDIR"
+export VOXTRAL_MODEL_DIR="${VOXTRAL_MODEL_DIR:-./voxtral-model}"
+
+if [ ! -d "$VOXTRAL_MODEL_DIR" ]; then
+    echo "Creating voxtral-model directory for tests..."
+    mkdir -p "$VOXTRAL_MODEL_DIR"
+    MODEL_ID="mistralai/Voxtral-Mini-4B-Realtime-2602"
+    echo "Model: ${MODEL_ID}"
+    echo ""
+
+    # Files to download
+    FILES=(
+        "consolidated.safetensors"
+        "params.json"
+        "tekken.json"
+    )
+
+    BASE_URL="https://huggingface.co/${MODEL_ID}/resolve/main"
+
+    for file in "${FILES[@]}"; do
+        dest="${VOXTRAL_MODEL_DIR}/${file}"
+        if [ -f "${dest}" ]; then
+            echo "  [skip] ${file} (already exists)"
+        else
+            echo "  [download] ${file}..."
+            curl -L -o "${dest}" "${BASE_URL}/${file}" --progress-bar
+            echo "  [done] ${file}"
+        fi
+    done
+fi
+
+# Run Go tests
+go test -v -timeout 300s ./...
+
+echo "All voxtral tests passed."
--- a/backend/go/voxtral/voxtral_test.go
+++ b/backend/go/voxtral/voxtral_test.go
@@ -0,0 +1,201 @@
+package main
+
+import (
+	"context"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"os/exec"
+	"path/filepath"
+	"strings"
+	"testing"
+	"time"
+
+	pb "github.com/mudler/LocalAI/pkg/grpc/proto"
+	"google.golang.org/grpc"
+	"google.golang.org/grpc/credentials/insecure"
+)
+
+const (
+	testAddr    = "localhost:50051"
+	sampleAudio = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
+	startupWait = 5 * time.Second
+)
+
+func skipIfNoModel(t *testing.T) string {
+	t.Helper()
+	modelDir := os.Getenv("VOXTRAL_MODEL_DIR")
+	if modelDir == "" {
+		t.Skip("VOXTRAL_MODEL_DIR not set, skipping test (set to voxtral model directory)")
+	}
+	if _, err := os.Stat(filepath.Join(modelDir, "consolidated.safetensors")); os.IsNotExist(err) {
+		t.Skipf("Model file not found in %s, skipping", modelDir)
+	}
+	return modelDir
+}
+
+func startServer(t *testing.T) *exec.Cmd {
+	t.Helper()
+	binary := os.Getenv("VOXTRAL_BINARY")
+	if binary == "" {
+		binary = "./voxtral"
+	}
+	if _, err := os.Stat(binary); os.IsNotExist(err) {
+		t.Skipf("Backend binary not found at %s, skipping", binary)
+	}
+	cmd := exec.Command(binary, "--addr", testAddr)
+	cmd.Stdout = os.Stderr
+	cmd.Stderr = os.Stderr
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Failed to start server: %v", err)
+	}
+	time.Sleep(startupWait)
+	return cmd
+}
+
+func stopServer(cmd *exec.Cmd) {
+	if cmd != nil && cmd.Process != nil {
+		cmd.Process.Kill()
+		cmd.Wait()
+	}
+}
+
+func dialGRPC(t *testing.T) *grpc.ClientConn {
+	t.Helper()
+	conn, err := grpc.Dial(testAddr,
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+		grpc.WithDefaultCallOptions(
+			grpc.MaxCallRecvMsgSize(50*1024*1024),
+			grpc.MaxCallSendMsgSize(50*1024*1024),
+		),
+	)
+	if err != nil {
+		t.Fatalf("Failed to dial gRPC: %v", err)
+	}
+	return conn
+}
+
+func downloadFile(url, dest string) error {
+	resp, err := http.Get(url)
+	if err != nil {
+		return fmt.Errorf("HTTP GET failed: %w", err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK {
+		return fmt.Errorf("bad status: %s", resp.Status)
+	}
+	f, err := os.Create(dest)
+	if err != nil {
+		return err
+	}
+	defer f.Close()
+	_, err = io.Copy(f, resp.Body)
+	return err
+}
+
+func TestServerHealth(t *testing.T) {
+	cmd := startServer(t)
+	defer stopServer(cmd)
+
+	conn := dialGRPC(t)
+	defer conn.Close()
+
+	client := pb.NewBackendClient(conn)
+	resp, err := client.Health(context.Background(), &pb.HealthMessage{})
+	if err != nil {
+		t.Fatalf("Health check failed: %v", err)
+	}
+	if string(resp.Message) != "OK" {
+		t.Fatalf("Expected OK, got %s", string(resp.Message))
+	}
+}
+
+func TestLoadModel(t *testing.T) {
+	modelDir := skipIfNoModel(t)
+	cmd := startServer(t)
+	defer stopServer(cmd)
+
+	conn := dialGRPC(t)
+	defer conn.Close()
+
+	client := pb.NewBackendClient(conn)
+	resp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
+		ModelFile: modelDir,
+	})
+	if err != nil {
+		t.Fatalf("LoadModel failed: %v", err)
+	}
+	if !resp.Success {
+		t.Fatalf("LoadModel returned failure: %s", resp.Message)
+	}
+}
+
+func TestAudioTranscription(t *testing.T) {
+	modelDir := skipIfNoModel(t)
+
+	tmpDir, err := os.MkdirTemp("", "voxtral-test")
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer os.RemoveAll(tmpDir)
+
+	// Download sample audio — JFK "ask not what your country can do for you" clip
+	audioFile := filepath.Join(tmpDir, "sample.wav")
+	t.Log("Downloading sample audio...")
+	if err := downloadFile(sampleAudio, audioFile); err != nil {
+		t.Fatalf("Failed to download sample audio: %v", err)
+	}
+
+	cmd := startServer(t)
+	defer stopServer(cmd)
+
+	conn := dialGRPC(t)
+	defer conn.Close()
+
+	client := pb.NewBackendClient(conn)
+
+	// Load model
+	loadResp, err := client.LoadModel(context.Background(), &pb.ModelOptions{
+		ModelFile: modelDir,
+	})
+	if err != nil {
+		t.Fatalf("LoadModel failed: %v", err)
+	}
+	if !loadResp.Success {
+		t.Fatalf("LoadModel returned failure: %s", loadResp.Message)
+	}
+
+	// Transcribe
+	transcriptResp, err := client.AudioTranscription(context.Background(), &pb.TranscriptRequest{
+		Dst: audioFile,
+	})
+	if err != nil {
+		t.Fatalf("AudioTranscription failed: %v", err)
+	}
+	if transcriptResp == nil {
+		t.Fatal("AudioTranscription returned nil")
+	}
+
+	t.Logf("Transcribed text: %s", transcriptResp.Text)
+	t.Logf("Number of segments: %d", len(transcriptResp.Segments))
+
+	if transcriptResp.Text == "" {
+		t.Fatal("Transcription returned empty text")
+	}
+
+	allText := strings.ToLower(transcriptResp.Text)
+	for _, seg := range transcriptResp.Segments {
+		allText += " " + strings.ToLower(seg.Text)
+	}
+	t.Logf("All text: %s", allText)
+
+	if !strings.Contains(allText, "big") {
+		t.Errorf("Expected 'big' in transcription, got: %s", allText)
+	}
+
+	// The sample audio should contain recognizable speech
+	if len(allText) < 10 {
+		t.Errorf("Transcription too short: %q", allText)
+	}
+}
--- a/backend/go/whisper/Makefile
+++ b/backend/go/whisper/Makefile
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)

 # whisper.cpp version
 WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
-WHISPER_CPP_VERSION?=941bdabbe4561bc6de68981aea01bc5ab05781c5
+WHISPER_CPP_VERSION?=764482c3175d9c3bc6089c1ec84df7d1b9537d83
 SO_TARGET?=libgowhisper.so

 CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
@@ -88,19 +88,19 @@ ifeq ($(UNAME_S),Linux)
 libgowhisper-avx.so: sources/whisper.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I whisper build info:avx${RESET})
-	SO_TARGET=libgowhisper-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) libgowhisper-custom
+	SO_TARGET=libgowhisper-avx.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgowhisper-custom
 	rm -rfv build*

 libgowhisper-avx2.so: sources/whisper.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I whisper build info:avx2${RESET})
-	SO_TARGET=libgowhisper-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) libgowhisper-custom
+	SO_TARGET=libgowhisper-avx2.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgowhisper-custom
 	rm -rfv build*

 libgowhisper-avx512.so: sources/whisper.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I whisper build info:avx512${RESET})
-	SO_TARGET=libgowhisper-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on" $(MAKE) libgowhisper-custom
+	SO_TARGET=libgowhisper-avx512.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on -DGGML_BMI2=on" $(MAKE) libgowhisper-custom
 	rm -rfv build*
 endif

@@ -108,7 +108,7 @@ endif
 libgowhisper-fallback.so: sources/whisper.cpp
 	$(MAKE) purge
 	$(info ${GREEN}I whisper build info:fallback${RESET})
-	SO_TARGET=libgowhisper-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off" $(MAKE) libgowhisper-custom
+	SO_TARGET=libgowhisper-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgowhisper-custom
 	rm -rfv build*

 libgowhisper-custom: CMakeLists.txt gowhisper.cpp gowhisper.h
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -56,6 +56,21 @@
    nvidia-cuda-12: "cuda12-whisper"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-whisper"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-whisper"
+- &voxtral
+  name: "voxtral"
+  alias: "voxtral"
+  license: mit
+  description: |
+    Voxtral Realtime 4B Pure C speech-to-text inference engine
+  urls:
+    - https://github.com/mudler/voxtral.c
+  tags:
+    - audio-transcription
+    - CPU
+    - Metal
+  capabilities:
+    default: "cpu-voxtral"
+    metal-darwin-arm64: "metal-voxtral"
 - &stablediffusionggml
  name: "stablediffusion-ggml"
  alias: "stablediffusion-ggml"
@@ -2594,3 +2609,24 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-pocket-tts"
  mirrors:
    - localai/localai-backends:master-metal-darwin-arm64-pocket-tts
+## voxtral
+- !!merge <<: *voxtral
+  name: "cpu-voxtral"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-voxtral"
+  mirrors:
+    - localai/localai-backends:latest-cpu-voxtral
+- !!merge <<: *voxtral
+  name: "cpu-voxtral-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-cpu-voxtral"
+  mirrors:
+    - localai/localai-backends:master-cpu-voxtral
+- !!merge <<: *voxtral
+  name: "metal-voxtral"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voxtral"
+  mirrors:
+    - localai/localai-backends:latest-metal-darwin-arm64-voxtral
+- !!merge <<: *voxtral
+  name: "metal-voxtral-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voxtral"
+  mirrors:
+    - localai/localai-backends:master-metal-darwin-arm64-voxtral
--- a/backend/python/diffusers/README.md
+++ b/backend/python/diffusers/README.md
@@ -115,6 +115,7 @@ Available pipelines: AnimateDiffPipeline, AnimateDiffVideoToVideoPipeline, ...
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `COMPEL` | `0` | Enable Compel for prompt weighting |
+| `SD_EMBED` | `0` | Enable sd_embed for prompt weighting |
 | `XPU` | `0` | Enable Intel XPU support |
 | `CLIPSKIP` | `1` | Enable CLIP skip support |
 | `SAFETENSORS` | `1` | Use safetensors format |
--- a/backend/python/diffusers/backend.py
+++ b/backend/python/diffusers/backend.py
@@ -40,6 +40,7 @@ from compel import Compel, ReturnedEmbeddingsType
 from optimum.quanto import freeze, qfloat8, quantize
 from transformers import T5EncoderModel
 from safetensors.torch import load_file
+from sd_embed.embedding_funcs import get_weighted_text_embeddings_sd15, get_weighted_text_embeddings_sdxl, get_weighted_text_embeddings_sd3, get_weighted_text_embeddings_flux1

 # Import LTX-2 specific utilities
 from diffusers.pipelines.ltx2.export_utils import encode_video as ltx2_encode_video
@@ -47,6 +48,7 @@ from diffusers import LTX2VideoTransformer3DModel, GGUFQuantizationConfig

 _ONE_DAY_IN_SECONDS = 60 * 60 * 24
 COMPEL = os.environ.get("COMPEL", "0") == "1"
+SD_EMBED = os.environ.get("SD_EMBED", "0") == "1"
 XPU = os.environ.get("XPU", "0") == "1"
 CLIPSKIP = os.environ.get("CLIPSKIP", "1") == "1"
 SAFETENSORS = os.environ.get("SAFETENSORS", "1") == "1"
@@ -737,6 +739,51 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
            kwargs["prompt_embeds"] = conditioning
            kwargs["pooled_prompt_embeds"] = pooled
            # pass the kwargs dictionary to the self.pipe method
+            image = self.pipe(
+                guidance_scale=self.cfg_scale,
+                **kwargs
+            ).images[0]
+        elif SD_EMBED:
+            if self.PipelineType == "StableDiffusionPipeline":
+                (
+                    kwargs["prompt_embeds"],
+                    kwargs["negative_prompt_embeds"],
+                ) = get_weighted_text_embeddings_sd15(
+                    pipe = self.pipe,
+                    prompt = prompt,
+                    neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None,
+                )
+            if self.PipelineType == "StableDiffusionXLPipeline":
+                (
+                    kwargs["prompt_embeds"],
+                    kwargs["negative_prompt_embeds"],
+                    kwargs["pooled_prompt_embeds"],
+                    kwargs["negative_pooled_prompt_embeds"],
+                ) = get_weighted_text_embeddings_sdxl(
+                    pipe = self.pipe,
+                    prompt = prompt,
+                    neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None
+                )
+            if self.PipelineType == "StableDiffusion3Pipeline":
+                (
+                    kwargs["prompt_embeds"],
+                    kwargs["negative_prompt_embeds"],
+                    kwargs["pooled_prompt_embeds"],
+                    kwargs["negative_pooled_prompt_embeds"],
+                ) = get_weighted_text_embeddings_sd3(
+                    pipe = self.pipe,
+                    prompt = prompt,
+                    neg_prompt = request.negative_prompt if hasattr(request, 'negative_prompt') else None
+                )
+            if self.PipelineType == "FluxTransformer2DModel":
+                (
+                    kwargs["prompt_embeds"],
+                    kwargs["pooled_prompt_embeds"],
+                ) = get_weighted_text_embeddings_flux1(
+                    pipe = self.pipe,
+                    prompt = prompt,
+                )
+
            image = self.pipe(
                guidance_scale=self.cfg_scale,
                **kwargs
--- a/backend/python/diffusers/requirements-cpu.txt
+++ b/backend/python/diffusers/requirements-cpu.txt
@@ -5,6 +5,7 @@ transformers
 torchvision==0.22.1
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 torch==2.7.1
--- a/backend/python/diffusers/requirements-cublas12.txt
+++ b/backend/python/diffusers/requirements-cublas12.txt
@@ -5,6 +5,7 @@ transformers
 torchvision
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 torch
--- a/backend/python/diffusers/requirements-cublas13.txt
+++ b/backend/python/diffusers/requirements-cublas13.txt
@@ -5,6 +5,7 @@ transformers
 torchvision
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 torch
--- a/backend/python/diffusers/requirements-intel.txt
+++ b/backend/python/diffusers/requirements-intel.txt
@@ -8,6 +8,7 @@ opencv-python
 transformers
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 optimum-quanto
--- a/backend/python/diffusers/requirements-l4t12.txt
+++ b/backend/python/diffusers/requirements-l4t12.txt
@@ -4,6 +4,7 @@ git+https://github.com/huggingface/diffusers
 transformers
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 optimum-quanto
 numpy<2
--- a/backend/python/diffusers/requirements-l4t13.txt
+++ b/backend/python/diffusers/requirements-l4t13.txt
@@ -4,6 +4,7 @@ git+https://github.com/huggingface/diffusers
 transformers
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 optimum-quanto
 numpy<2
--- a/backend/python/diffusers/requirements-mps.txt
+++ b/backend/python/diffusers/requirements-mps.txt
@@ -5,6 +5,7 @@ opencv-python
 transformers
 accelerate
 compel
+git+https://github.com/xhinker/sd_embed
 peft
 sentencepiece
 optimum-quanto
--- a/backend/python/moonshine/backend.py
+++ b/backend/python/moonshine/backend.py
@@ -10,7 +10,11 @@ import sys
 import os
 import backend_pb2
 import backend_pb2_grpc
-import moonshine_onnx
+from moonshine_voice import (
+    Transcriber,
+    get_model_for_language,
+    load_wav_file,
+)

 import grpc

@@ -25,16 +29,49 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
    """
    BackendServicer is the class that implements the gRPC service
    """
+    def __init__(self):
+        self.transcriber = None
+        self.model_name = None
+
    def Health(self, request, context):
        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
    
    def LoadModel(self, request, context):
        try:
            print("Preparing models, please wait", file=sys.stderr)
-            # Store the model name for use in transcription
-            # Model name format: e.g., "moonshine/tiny"
            self.model_name = request.Model
            print(f"Model name set to: {self.model_name}", file=sys.stderr)
+            
+            # Default values
+            language = "en"
+            model_arch = None
+            
+            # Parse options from request
+            options = request.Options
+            self.options = {}
+            
+            # The options are a list of strings in this form optname:optvalue
+            for opt in options:
+                if ":" not in opt:
+                    continue
+                key, value = opt.split(":", 1)
+                self.options[key] = value
+            
+            print(f"Options: {self.options}", file=sys.stderr)
+            
+            # Extract language and model_arch from options
+            if "language" in self.options:
+                language = self.options["language"]
+            if "model_arch" in self.options:
+                model_arch = self.options["model_arch"]
+            
+            # Get the model path and architecture
+            model_path, model_arch = get_model_for_language(language, model_arch)
+            print(f"Loading model: {model_path} with architecture: {model_arch} for language: {language}", file=sys.stderr)
+            
+            # Initialize the transcriber
+            self.transcriber = Transcriber(model_path=model_path, model_arch=model_arch)
+            print("Model loaded successfully", file=sys.stderr)
        except Exception as err:
            return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
        return backend_pb2.Result(message="Model loaded successfully", success=True)
@@ -43,33 +80,44 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        resultSegments = []
        text = ""
        try:
-            # moonshine_onnx.transcribe returns a list of strings
-            transcriptions = moonshine_onnx.transcribe(request.dst, self.model_name)
+            if self.transcriber is None:
+                raise Exception("Model not loaded. Call LoadModel first.")
+            
+            # Load the audio file
+            audio_data, sample_rate = load_wav_file(request.dst)
+            print(f"Loaded audio file: {request.dst} with sample rate: {sample_rate}", file=sys.stderr)
+            
+            # Transcribe without streaming
+            transcript = self.transcriber.transcribe_without_streaming(
+                audio_data, sample_rate=sample_rate, flags=0
+            )
+            
+            # Process transcript lines
+            full_text_parts = []
+            for idx, line in enumerate(transcript.lines):
+                line_text = line.text.strip()
+                full_text_parts.append(line_text)
+                
+                # Create segment with timing information
+                start_ms = int(line.start_time * 1000)
+                end_ms = int((line.start_time + line.duration) * 1000)
+                
+                resultSegments.append(backend_pb2.TranscriptSegment(
+                    id=idx,
+                    start=start_ms,
+                    end=end_ms,
+                    text=line_text
+                ))
+                
+                print(f"Segment {idx}: [{line.start_time:.2f}s - {line.start_time + line.duration:.2f}s] {line_text}", file=sys.stderr)
            
            # Combine all transcriptions into a single text
-            if isinstance(transcriptions, list):
-                text = " ".join(transcriptions)
-                # Create segments for each transcription in the list
-                for id, trans in enumerate(transcriptions):
-                    # Since moonshine doesn't provide timing info, we'll create a single segment
-                    # with id and text, using approximate timing
-                    resultSegments.append(backend_pb2.TranscriptSegment(
-                        id=id, 
-                        start=0, 
-                        end=0, 
-                        text=trans
-                    ))
-            else:
-                # Handle case where it's not a list (shouldn't happen, but be safe)
-                text = str(transcriptions)
-                resultSegments.append(backend_pb2.TranscriptSegment(
-                    id=0,
-                    start=0,
-                    end=0,
-                    text=text
-                ))
+            text = " ".join(full_text_parts)
+            
        except Exception as err:
            print(f"Unexpected {err=}, {type(err)=}", file=sys.stderr)
+            import traceback
+            traceback.print_exc()
            return backend_pb2.TranscriptResult(segments=[], text="")

        return backend_pb2.TranscriptResult(segments=resultSegments, text=text)
--- a/backend/python/moonshine/requirements-mps.txt
+++ b/backend/python/moonshine/requirements-mps.txt
@@ -1,4 +1,4 @@
 grpcio==1.71.0
 protobuf
 grpcio-tools
-useful-moonshine-onnx@git+https://git@github.com/moonshine-ai/moonshine.git#subdirectory=moonshine-onnx
+moonshine-voice
--- a/backend/python/moonshine/requirements.txt
+++ b/backend/python/moonshine/requirements.txt
@@ -1,4 +1,4 @@
 grpcio==1.71.0
 protobuf
 grpcio-tools
-useful-moonshine-onnx@git+https://git@github.com/moonshine-ai/moonshine.git#subdirectory=moonshine-onnx
+moonshine-voice
--- a/backend/python/moonshine/test.py
+++ b/backend/python/moonshine/test.py
@@ -112,7 +112,7 @@ class TestBackendServicer(unittest.TestCase):
                self.assertGreaterEqual(len(transcript_response.segments), 0)
                
                # Verify the transcription contains the expected text
-                expected_text = "This is the micro machine man presenting the most midget miniature"
+                expected_text = "This is the micro machine man"
                self.assertIn(
                    expected_text.lower(),
                    transcript_response.text.lower(),
--- a/backend/python/voxcpm/requirements.txt
+++ b/backend/python/voxcpm/requirements.txt
@@ -1,3 +1,4 @@
+setuptools
 grpcio==1.76.0
 protobuf
 certifi
--- a/core/config/model_config_loader.go
+++ b/core/config/model_config_loader.go
@@ -76,42 +76,35 @@ func (lo *LoadOptions) Apply(options ...ConfigLoaderOption) {
 	}
 }

-// TODO: either in the next PR or the next commit, I want to merge these down into a single function that looks at the first few characters of the file to determine if we need to deserialize to []BackendConfig or BackendConfig
-func readMultipleModelConfigsFromFile(file string, opts ...ConfigLoaderOption) ([]*ModelConfig, error) {
-	c := &[]*ModelConfig{}
+// readModelConfigsFromFile reads a config file that may contain either a single
+// ModelConfig or an array of ModelConfigs. It tries to unmarshal as an array first,
+// then falls back to a single config if that fails.
+func readModelConfigsFromFile(file string, opts ...ConfigLoaderOption) ([]*ModelConfig, error) {
 	f, err := os.ReadFile(file)
 	if err != nil {
-		return nil, fmt.Errorf("readMultipleModelConfigsFromFile cannot read config file %q: %w", file, err)
-	}
-	if err := yaml.Unmarshal(f, c); err != nil {
-		return nil, fmt.Errorf("readMultipleModelConfigsFromFile cannot unmarshal config file %q: %w", file, err)
+		return nil, fmt.Errorf("readModelConfigsFromFile cannot read config file %q: %w", file, err)
 	}

-	for _, cc := range *c {
-		cc.modelConfigFile = file
-		cc.SetDefaults(opts...)
+	// Try to unmarshal as array first
+	var configs []*ModelConfig
+	if err := yaml.Unmarshal(f, &configs); err == nil && len(configs) > 0 {
+		for _, cc := range configs {
+			cc.modelConfigFile = file
+			cc.SetDefaults(opts...)
+		}
+		return configs, nil
 	}

-	return *c, nil
-}
-
-func readModelConfigFromFile(file string, opts ...ConfigLoaderOption) (*ModelConfig, error) {
-	lo := &LoadOptions{}
-	lo.Apply(opts...)
-
+	// Fall back to single config
 	c := &ModelConfig{}
-	f, err := os.ReadFile(file)
-	if err != nil {
-		return nil, fmt.Errorf("readModelConfigFromFile cannot read config file %q: %w", file, err)
-	}
 	if err := yaml.Unmarshal(f, c); err != nil {
-		return nil, fmt.Errorf("readModelConfigFromFile cannot unmarshal config file %q: %w", file, err)
+		return nil, fmt.Errorf("readModelConfigsFromFile cannot unmarshal config file %q: %w", file, err)
 	}

-	c.SetDefaults(opts...)
-
 	c.modelConfigFile = file
-	return c, nil
+	c.SetDefaults(opts...)
+
+	return []*ModelConfig{c}, nil
 }

 // Load a config file for a model
@@ -163,7 +156,7 @@ func (bcl *ModelConfigLoader) LoadModelConfigFileByNameDefaultOptions(modelName
 func (bcl *ModelConfigLoader) LoadMultipleModelConfigsSingleFile(file string, opts ...ConfigLoaderOption) error {
 	bcl.Lock()
 	defer bcl.Unlock()
-	c, err := readMultipleModelConfigsFromFile(file, opts...)
+	c, err := readModelConfigsFromFile(file, opts...)
 	if err != nil {
 		return fmt.Errorf("cannot load config file: %w", err)
 	}
@@ -181,11 +174,18 @@ func (bcl *ModelConfigLoader) LoadMultipleModelConfigsSingleFile(file string, op
 func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderOption) error {
 	bcl.Lock()
 	defer bcl.Unlock()
-	c, err := readModelConfigFromFile(file, opts...)
+	configs, err := readModelConfigsFromFile(file, opts...)
 	if err != nil {
 		return fmt.Errorf("ReadModelConfig cannot read config file %q: %w", file, err)
 	}
+	if len(configs) == 0 {
+		return fmt.Errorf("ReadModelConfig: no configs found in file %q", file)
+	}
+	if len(configs) > 1 {
+		xlog.Warn("ReadModelConig: read more than one config from file, only using first", "file", file, "configs", len(configs))
+	}

+	c := configs[0]
 	if valid, err := c.Validate(); valid {
 		bcl.configs[c.Name] = *c
 	} else {
@@ -375,15 +375,23 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
 			strings.HasPrefix(file.Name(), ".") {
 			continue
 		}
-		c, err := readModelConfigFromFile(filepath.Join(path, file.Name()), opts...)
+
+		filePath := filepath.Join(path, file.Name())
+
+		// Read config(s) - handles both single and array formats
+		configs, err := readModelConfigsFromFile(filePath, opts...)
 		if err != nil {
 			xlog.Error("LoadModelConfigsFromPath cannot read config file", "error", err, "File Name", file.Name())
 			continue
 		}
-		if valid, validationErr := c.Validate(); valid {
-			bcl.configs[c.Name] = *c
-		} else {
-			xlog.Error("config is not valid", "error", validationErr, "Name", c.Name)
+
+		// Validate and store each config
+		for _, c := range configs {
+			if valid, validationErr := c.Validate(); valid {
+				bcl.configs[c.Name] = *c
+			} else {
+				xlog.Error("config is not valid", "error", validationErr, "Name", c.Name)
+			}
 		}
 	}

--- a/core/config/model_config_test.go
+++ b/core/config/model_config_test.go
@@ -25,7 +25,8 @@ known_usecases:
 - COMPLETION
 `)
 			Expect(err).ToNot(HaveOccurred())
-			config, err := readModelConfigFromFile(tmp.Name())
+			configs, err := readModelConfigsFromFile(tmp.Name())
+			config := configs[0]
 			Expect(err).To(BeNil())
 			Expect(config).ToNot(BeNil())
 			valid, err := config.Validate()
@@ -43,7 +44,8 @@ backend: "foo-bar"
 parameters:
  model: "foo-bar"`)
 			Expect(err).ToNot(HaveOccurred())
-			config, err := readModelConfigFromFile(tmp.Name())
+			configs, err := readModelConfigsFromFile(tmp.Name())
+			config := configs[0]
 			Expect(err).To(BeNil())
 			Expect(config).ToNot(BeNil())
 			// two configs in config.yaml
@@ -62,7 +64,8 @@ parameters:
 			defer os.Remove(tmp.Name())
 			_, err = io.Copy(tmp, resp.Body)
 			Expect(err).To(BeNil())
-			config, err = readModelConfigFromFile(tmp.Name())
+			configs, err = readModelConfigsFromFile(tmp.Name())
+			config = configs[0]
 			Expect(err).To(BeNil())
 			Expect(config).ToNot(BeNil())
 			// two configs in config.yaml
@@ -188,7 +191,8 @@ mcp:
      }
    }`)
 		Expect(err).ToNot(HaveOccurred())
-		config, err := readModelConfigFromFile(tmp.Name())
+		configs, err := readModelConfigsFromFile(tmp.Name())
+		config := configs[0]
 		Expect(err).To(BeNil())
 		Expect(config).ToNot(BeNil())
 		valid, err := config.Validate()
@@ -218,7 +222,8 @@ mcp:
      }
    }`)
 		Expect(err).ToNot(HaveOccurred())
-		config, err := readModelConfigFromFile(tmp.Name())
+		configs, err := readModelConfigsFromFile(tmp.Name())
+		config := configs[0]
 		Expect(err).To(BeNil())
 		Expect(config).ToNot(BeNil())
 		valid, err := config.Validate()
--- a/core/config/model_test.go
+++ b/core/config/model_test.go
@@ -16,7 +16,7 @@ var _ = Describe("Test cases for config related functions", func() {
 	Context("Test Read configuration functions", func() {
 		configFile = os.Getenv("CONFIG_FILE")
 		It("Test readConfigFile", func() {
-			config, err := readMultipleModelConfigsFromFile(configFile)
+			config, err := readModelConfigsFromFile(configFile)
 			Expect(err).To(BeNil())
 			Expect(config).ToNot(BeNil())
 			// two configs in config.yaml
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -109,10 +109,10 @@ func API(application *application.Application) (*echo.Echo, error) {
 			res := c.Response()
 			err := next(c)

-			// Fix for #7989: Reduce log verbosity of Web UI polling
-			// If the path is /api/operations and the request was successful (200),
+			// Fix for #7989: Reduce log verbosity of Web UI polling and resources API
+			// If the path is /api/operations or /api/resources and the request was successful (200),
 			// we log it at DEBUG level (hidden by default) instead of INFO.
-			if req.URL.Path == "/api/operations" && res.Status == 200 {
+			if (req.URL.Path == "/api/operations" || req.URL.Path == "/api/resources") && res.Status == 200 {
 				xlog.Debug("HTTP request", "method", req.Method, "path", req.URL.Path, "status", res.Status)
 			} else {
 				xlog.Info("HTTP request", "method", req.Method, "path", req.URL.Path, "status", res.Status)
--- a/core/http/app_test.go
+++ b/core/http/app_test.go
@@ -336,6 +336,7 @@ var _ = Describe("API test", func() {
 						Name: "bert",
 						URL:  bertEmbeddingsURL,
 					},
+					Overrides: map[string]interface{}{"backend": "llama-cpp"},
 				},
 				{
 					Metadata: gallery.Metadata{
@@ -953,7 +954,8 @@ parameters:
 		It("returns the models list", func() {
 			models, err := client.ListModels(context.TODO())
 			Expect(err).ToNot(HaveOccurred())
-			Expect(len(models.Models)).To(Equal(7)) // If "config.yaml" should be included, this should be 8?
+			// A model called "bert" can be present in the model directory depending on the order of the tests
+			Expect(len(models.Models)).To(BeNumerically(">=", 8))
 		})
 		It("can generate completions via ggml", func() {
 			if runtime.GOOS != "linux" {
--- a/core/http/endpoints/openai/image.go
+++ b/core/http/endpoints/openai/image.go
@@ -23,10 +23,15 @@ import (
 	"github.com/mudler/LocalAI/core/backend"

 	model "github.com/mudler/LocalAI/pkg/model"
+	"github.com/mudler/LocalAI/pkg/utils"
 	"github.com/mudler/xlog"
 )

 func downloadFile(url string) (string, error) {
+	if err := utils.ValidateExternalURL(url); err != nil {
+		return "", fmt.Errorf("URL validation failed: %w", err)
+	}
+
 	// Get the data
 	resp, err := http.Get(url)
 	if err != nil {
--- a/core/http/endpoints/openai/realtime.go
+++ b/core/http/endpoints/openai/realtime.go
@@ -32,13 +32,26 @@ import (
 )

 const (
+	// XXX: Presently it seems all ASR/VAD backends use 16Khz. If a backend uses 24Khz then it will likely still work, but have reduced performance
 	localSampleRate  = 16000
-	remoteSampleRate = 24000
+	defaultRemoteSampleRate = 24000
 )

 // A model can be "emulated" that is: transcribe audio to text -> feed text to the LLM -> generate audio as result
 // If the model support instead audio-to-audio, we will use the specific gRPC calls instead

+// LockedWebsocket wraps a websocket connection with a mutex for safe concurrent writes
+type LockedWebsocket struct {
+	*websocket.Conn
+	sync.Mutex
+}
+
+func (l *LockedWebsocket) WriteMessage(messageType int, data []byte) error {
+	l.Lock()
+	defer l.Unlock()
+	return l.Conn.WriteMessage(messageType, data)
+}
+
 // Session represents a single WebSocket connection and its state
 type Session struct {
 	ID                string
@@ -58,7 +71,8 @@ type Session struct {
 	DefaultConversationID   string
 	ModelInterface          Model
 	// The pipeline model config or the config for an any-to-any model
-	ModelConfig *config.ModelConfig
+	ModelConfig     *config.ModelConfig
+	InputSampleRate int
 }

 func (s *Session) FromClient(session *types.SessionUnion) {
@@ -162,7 +176,8 @@ func Realtime(application *application.Application) echo.HandlerFunc {
 }

 func registerRealtime(application *application.Application, model string) func(c *websocket.Conn) {
-	return func(c *websocket.Conn) {
+	return func(conn *websocket.Conn) {
+		c := &LockedWebsocket{Conn: conn}

 		evaluator := application.TemplatesEvaluator()
 		xlog.Debug("Realtime WebSocket connection established", "address", c.RemoteAddr().String(), "model", model)
@@ -183,14 +198,13 @@ func registerRealtime(application *application.Application, model string) func(c
 		}

 		sttModel := cfg.Pipeline.Transcription
-		ttsModel := cfg.Pipeline.TTS

 		sessionID := generateSessionID()
 		session := &Session{
 			ID:                sessionID,
 			TranscriptionOnly: false,
 			Model:             model,
-			Voice:             ttsModel,
+			Voice:             cfg.TTSConfig.Voice,
 			ModelConfig:       cfg,
 			TurnDetection: &types.TurnDetectionUnion{
 				ServerVad: &types.ServerVad{
@@ -203,7 +217,8 @@ func registerRealtime(application *application.Application, model string) func(c
 			InputAudioTranscription: &types.AudioTranscription{
 				Model: sttModel,
 			},
-			Conversations: make(map[string]*Conversation),
+			Conversations:   make(map[string]*Conversation),
+			InputSampleRate: defaultRemoteSampleRate,
 		}

 		// Create a default conversation
@@ -383,7 +398,36 @@ func registerRealtime(application *application.Application, model string) func(c

 			case types.ConversationItemCreateEvent:
 				xlog.Debug("recv", "message", string(msg))
-				sendNotImplemented(c, "conversation.item.create")
+				// Add the item to the conversation
+				item := e.Item
+				// Ensure IDs are present
+				if item.User != nil && item.User.ID == "" {
+					item.User.ID = generateItemID()
+				}
+				if item.Assistant != nil && item.Assistant.ID == "" {
+					item.Assistant.ID = generateItemID()
+				}
+				if item.System != nil && item.System.ID == "" {
+					item.System.ID = generateItemID()
+				}
+				if item.FunctionCall != nil && item.FunctionCall.ID == "" {
+					item.FunctionCall.ID = generateItemID()
+				}
+				if item.FunctionCallOutput != nil && item.FunctionCallOutput.ID == "" {
+					item.FunctionCallOutput.ID = generateItemID()
+				}
+
+				conversation.Lock.Lock()
+				conversation.Items = append(conversation.Items, &item)
+				conversation.Lock.Unlock()
+
+				sendEvent(c, types.ConversationItemAddedEvent{
+					ServerEventBase: types.ServerEventBase{
+						EventID: e.EventID,
+					},
+					PreviousItemID: e.PreviousItemID,
+					Item:           item,
+				})

 			case types.ConversationItemDeleteEvent:
 				sendError(c, "not_implemented", "Deleting items not implemented", "", "event_TODO")
@@ -429,7 +473,34 @@ func registerRealtime(application *application.Application, model string) func(c

 			case types.ResponseCreateEvent:
 				xlog.Debug("recv", "message", string(msg))
-				sendNotImplemented(c, "response.create")
+
+				// Handle optional items to add to context
+				if len(e.Response.Input) > 0 {
+					conversation.Lock.Lock()
+					for _, item := range e.Response.Input {
+						// Ensure IDs are present
+						if item.User != nil && item.User.ID == "" {
+							item.User.ID = generateItemID()
+						}
+						if item.Assistant != nil && item.Assistant.ID == "" {
+							item.Assistant.ID = generateItemID()
+						}
+						if item.System != nil && item.System.ID == "" {
+							item.System.ID = generateItemID()
+						}
+						if item.FunctionCall != nil && item.FunctionCall.ID == "" {
+							item.FunctionCall.ID = generateItemID()
+						}
+						if item.FunctionCallOutput != nil && item.FunctionCallOutput.ID == "" {
+							item.FunctionCallOutput.ID = generateItemID()
+						}
+
+						conversation.Items = append(conversation.Items, &item)
+					}
+					conversation.Lock.Unlock()
+				}
+
+				go triggerResponse(session, conversation, c, &e.Response)

 			case types.ResponseCancelEvent:
 				xlog.Debug("recv", "message", string(msg))
@@ -456,7 +527,7 @@ func registerRealtime(application *application.Application, model string) func(c
 }

 // Helper function to send events to the client
-func sendEvent(c *websocket.Conn, event types.ServerEvent) {
+func sendEvent(c *LockedWebsocket, event types.ServerEvent) {
 	eventBytes, err := json.Marshal(event)
 	if err != nil {
 		xlog.Error("failed to marshal event", "error", err)
@@ -468,7 +539,7 @@ func sendEvent(c *websocket.Conn, event types.ServerEvent) {
 }

 // Helper function to send errors to the client
-func sendError(c *websocket.Conn, code, message, param, eventID string) {
+func sendError(c *LockedWebsocket, code, message, param, eventID string) {
 	errorEvent := types.ErrorEvent{
 		ServerEventBase: types.ServerEventBase{
 			EventID: eventID,
@@ -485,7 +556,7 @@ func sendError(c *websocket.Conn, code, message, param, eventID string) {
 	sendEvent(c, errorEvent)
 }

-func sendNotImplemented(c *websocket.Conn, message string) {
+func sendNotImplemented(c *LockedWebsocket, message string) {
 	sendError(c, "not_implemented", message, "", "event_TODO")
 }

@@ -530,6 +601,12 @@ func updateTransSession(session *Session, update *types.SessionUnion, cl *config
 		session.TurnDetection = update.Transcription.Audio.Input.TurnDetection
 	}

+	if update.Transcription.Audio.Input.Format != nil && update.Transcription.Audio.Input.Format.PCM != nil {
+		if update.Transcription.Audio.Input.Format.PCM.Rate > 0 {
+			session.InputSampleRate = update.Transcription.Audio.Input.Format.PCM.Rate
+		}
+	}
+
 	return nil
 }

@@ -557,13 +634,13 @@ func updateSession(session *Session, update *types.SessionUnion, cl *config.Mode
 			session.InputAudioTranscription = &types.AudioTranscription{}
 		}
 		session.InputAudioTranscription.Model = cfg.Pipeline.Transcription
-		session.Voice = cfg.Pipeline.TTS
+		session.Voice = cfg.TTSConfig.Voice
 		session.Model = rt.Model
 		session.ModelConfig = cfg
 	}

 	if rt.Audio != nil && rt.Audio.Output != nil && rt.Audio.Output.Voice != "" {
-		xlog.Warn("Ignoring voice setting; not implemented", "voice", rt.Audio.Output.Voice)
+		session.Voice = string(rt.Audio.Output.Voice)
 	}

 	if rt.Audio != nil && rt.Audio.Input != nil && rt.Audio.Input.Transcription != nil {
@@ -583,6 +660,12 @@ func updateSession(session *Session, update *types.SessionUnion, cl *config.Mode
 		session.TurnDetection = rt.Audio.Input.TurnDetection
 	}

+	if rt.Audio != nil && rt.Audio.Input != nil && rt.Audio.Input.Format != nil && rt.Audio.Input.Format.PCM != nil {
+		if rt.Audio.Input.Format.PCM.Rate > 0 {
+			session.InputSampleRate = rt.Audio.Input.Format.PCM.Rate
+		}
+	}
+
 	if rt.Instructions != "" {
 		session.Instructions = rt.Instructions
 	}
@@ -599,7 +682,7 @@ func updateSession(session *Session, update *types.SessionUnion, cl *config.Mode

 // handleVAD is a goroutine that listens for audio data from the client,
 // runs VAD on the audio data, and commits utterances to the conversation
-func handleVAD(session *Session, conv *Conversation, c *websocket.Conn, done chan struct{}) {
+func handleVAD(session *Session, conv *Conversation, c *LockedWebsocket, done chan struct{}) {
 	vadContext, cancel := context.WithCancel(context.Background())
 	go func() {
 		<-done
@@ -628,12 +711,12 @@ func handleVAD(session *Session, conv *Conversation, c *websocket.Conn, done cha
 			session.AudioBufferLock.Unlock()

 			aints := sound.BytesToInt16sLE(allAudio)
-			if len(aints) == 0 || len(aints) < int(silenceThreshold)*remoteSampleRate {
+			if len(aints) == 0 || len(aints) < int(silenceThreshold)*session.InputSampleRate {
 				continue
 			}

-			// Resample from 24kHz to 16kHz
-			aints = sound.ResampleInt16(aints, remoteSampleRate, localSampleRate)
+			// Resample from InputSampleRate to 16kHz
+			aints = sound.ResampleInt16(aints, session.InputSampleRate, localSampleRate)

 			segments, err := runVAD(vadContext, session, aints)
 			if err != nil {
@@ -713,7 +796,7 @@ func handleVAD(session *Session, conv *Conversation, c *websocket.Conn, done cha
 	}
 }

-func commitUtterance(ctx context.Context, utt []byte, session *Session, conv *Conversation, c *websocket.Conn) {
+func commitUtterance(ctx context.Context, utt []byte, session *Session, conv *Conversation, c *LockedWebsocket) {
 	if len(utt) == 0 {
 		return
 	}
@@ -746,6 +829,10 @@ func commitUtterance(ctx context.Context, utt []byte, session *Session, conv *Co
 		tr, err := session.ModelInterface.Transcribe(ctx, f.Name(), session.InputAudioTranscription.Language, false, false, session.InputAudioTranscription.Prompt)
 		if err != nil {
 			sendError(c, "transcription_failed", err.Error(), "", "event_TODO")
+			return
+		} else if tr == nil {
+			sendError(c, "transcription_failed", "trancribe result is nil", "", "event_TODO")
+			return
 		}

 		transcript = tr.Text
@@ -791,11 +878,10 @@ func runVAD(ctx context.Context, session *Session, adata []int16) ([]schema.VADS
 }

 // Function to generate a response based on the conversation
-func generateResponse(session *Session, utt []byte, transcript string, conv *Conversation, c *websocket.Conn, mt int) {
+func generateResponse(session *Session, utt []byte, transcript string, conv *Conversation, c *LockedWebsocket, mt int) {
 	xlog.Debug("Generating realtime response...")

-	config := session.ModelInterface.PredictConfig()
-
+	// Create user message item
 	item := types.MessageItemUnion{
 		User: &types.MessageItemUser{
 			ID:     generateItemID(),
@@ -817,33 +903,73 @@ func generateResponse(session *Session, utt []byte, transcript string, conv *Con
 		Item: item,
 	})

+	triggerResponse(session, conv, c, nil)
+}
+
+func triggerResponse(session *Session, conv *Conversation, c *LockedWebsocket, overrides *types.ResponseCreateParams) {
+	config := session.ModelInterface.PredictConfig()
+
+	// Default values
+	tools := session.Tools
+	toolChoice := session.ToolChoice
+	instructions := session.Instructions
+	// Overrides
+	if overrides != nil {
+		if overrides.Tools != nil {
+			tools = overrides.Tools
+		}
+		if overrides.ToolChoice != nil {
+			toolChoice = overrides.ToolChoice
+		}
+		if overrides.Instructions != "" {
+			instructions = overrides.Instructions
+		}
+	}
+
 	var conversationHistory schema.Messages
 	conversationHistory = append(conversationHistory, schema.Message{
 		Role:          string(types.MessageRoleSystem),
-		StringContent: session.Instructions,
-		Content:       session.Instructions,
+		StringContent: instructions,
+		Content:       instructions,
 	})

+	imgIndex := 0
 	conv.Lock.Lock()
 	for _, item := range conv.Items {
 		if item.User != nil {
+			msg := schema.Message{
+				Role: string(types.MessageRoleUser),
+			}
+			textContent := ""
+			nrOfImgsInMessage := 0
 			for _, content := range item.User.Content {
 				switch content.Type {
 				case types.MessageContentTypeInputText:
-					conversationHistory = append(conversationHistory, schema.Message{
-						Role:          string(types.MessageRoleUser),
-						StringContent: content.Text,
-						Content:       content.Text,
-					})
+					textContent += content.Text
 				case types.MessageContentTypeInputAudio:
-					conversationHistory = append(conversationHistory, schema.Message{
-						Role:          string(types.MessageRoleUser),
-						StringContent: content.Transcript,
-						Content:       content.Transcript,
-						StringAudios:  []string{content.Audio},
-					})
+					textContent += content.Transcript
+				case types.MessageContentTypeInputImage:
+					msg.StringImages = append(msg.StringImages, content.ImageURL)
+					imgIndex++
+					nrOfImgsInMessage++
 				}
 			}
+			if nrOfImgsInMessage > 0 {
+				templated, err := templates.TemplateMultiModal(config.TemplateConfig.Multimodal, templates.MultiModalOptions{
+					TotalImages:     imgIndex,
+					ImagesInMessage: nrOfImgsInMessage,
+				}, textContent)
+				if err != nil {
+					xlog.Warn("Failed to apply multimodal template", "error", err)
+					templated = textContent
+				}
+				msg.StringContent = templated
+				msg.Content = templated
+			} else {
+				msg.StringContent = textContent
+				msg.Content = textContent
+			}
+			conversationHistory = append(conversationHistory, msg)
 		} else if item.Assistant != nil {
 			for _, content := range item.Assistant.Content {
 				switch content.Type {
@@ -874,6 +1000,11 @@ func generateResponse(session *Session, utt []byte, transcript string, conv *Con
 	}
 	conv.Lock.Unlock()

+	var images []string
+	for _, m := range conversationHistory {
+		images = append(images, m.StringImages...)
+	}
+
 	responseID := generateUniqueID()
 	sendEvent(c, types.ResponseCreatedEvent{
 		ServerEventBase: types.ServerEventBase{},
@@ -884,15 +1015,15 @@ func generateResponse(session *Session, utt []byte, transcript string, conv *Con
 		},
 	})

-	predFunc, err := session.ModelInterface.Predict(context.TODO(), conversationHistory, nil, nil, nil, nil, session.Tools, session.ToolChoice, nil, nil, nil)
+	predFunc, err := session.ModelInterface.Predict(context.TODO(), conversationHistory, images, nil, nil, nil, tools, toolChoice, nil, nil, nil)
 	if err != nil {
-		sendError(c, "inference_failed", fmt.Sprintf("backend error: %v", err), "", item.Assistant.ID)
+		sendError(c, "inference_failed", fmt.Sprintf("backend error: %v", err), "", "") // item.Assistant.ID is unknown here
 		return
 	}

 	pred, err := predFunc()
 	if err != nil {
-		sendError(c, "prediction_failed", fmt.Sprintf("backend error: %v", err), "", item.Assistant.ID)
+		sendError(c, "prediction_failed", fmt.Sprintf("backend error: %v", err), "", "")
 		return
 	}

@@ -1006,7 +1137,16 @@ func generateResponse(session *Session, utt []byte, transcript string, conv *Con
 			sendError(c, "tts_error", fmt.Sprintf("Failed to read TTS audio: %v", err), "", item.Assistant.ID)
 			return
 		}
-		audioString := base64.StdEncoding.EncodeToString(audioBytes)
+
+		// Strip WAV header (44 bytes) to get raw PCM data
+		// The OpenAI Realtime API expects raw PCM, not WAV files
+		const wavHeaderSize = 44
+		pcmData := audioBytes
+		if len(audioBytes) > wavHeaderSize {
+			pcmData = audioBytes[wavHeaderSize:]
+		}
+
+		audioString := base64.StdEncoding.EncodeToString(pcmData)

 		sendEvent(c, types.ResponseOutputAudioTranscriptDeltaEvent{
 			ServerEventBase: types.ServerEventBase{},
@@ -1131,7 +1271,6 @@ func generateResponse(session *Session, utt []byte, transcript string, conv *Con
 			Status: types.ResponseStatusCompleted,
 		},
 	})
-
 }

 // Helper functions to generate unique IDs
--- a/core/http/endpoints/openai/types/message_item.go
+++ b/core/http/endpoints/openai/types/message_item.go
@@ -26,6 +26,7 @@ const (
 	MessageContentTypeTranscript  MessageContentType = "transcript"
 	MessageContentTypeInputText   MessageContentType = "input_text"
 	MessageContentTypeInputAudio  MessageContentType = "input_audio"
+	MessageContentTypeInputImage  MessageContentType = "input_image"
 	MessageContentTypeOutputText  MessageContentType = "output_text"
 	MessageContentTypeOutputAudio MessageContentType = "output_audio"
 )
--- a/core/http/views/model-editor.html
+++ b/core/http/views/model-editor.html
@@ -1026,10 +1026,11 @@ parameters:
                if (!config.name) {
                    throw new Error('Model name is required');
                }
-                if (!config.backend) {
+                const isPipeline = config.pipeline && (config.pipeline.vad || config.pipeline.transcription || config.pipeline.tts || config.pipeline.llm);
+                if (!isPipeline && !config.backend) {
                    throw new Error('Backend is required');
                }
-                if (!config.parameters || !config.parameters.model) {
+                if (!isPipeline && (!config.parameters || !config.parameters.model)) {
                    throw new Error('Model file/path is required in parameters.model');
                }
                
@@ -1041,7 +1042,6 @@ parameters:
        
        async saveConfig() {
            try {
-                // Validate before saving
                const yamlContent = this.yamlEditor.getValue();
                const config = jsyaml.load(yamlContent);
                
@@ -1052,13 +1052,13 @@ parameters:
                if (!config.name) {
                    throw new Error('Model name is required');
                }
-                if (!config.backend) {
+                const isPipeline = config.pipeline && (config.pipeline.vad || config.pipeline.transcription || config.pipeline.tts || config.pipeline.llm);
+                if (!isPipeline && !config.backend) {
                    throw new Error('Backend is required');
                }
-                if (!config.parameters || !config.parameters.model) {
+                if (!isPipeline && (!config.parameters || !config.parameters.model)) {
                    throw new Error('Model file/path is required in parameters.model');
                }
-                
                const endpoint = this.isEditMode ? `/models/edit/{{.ModelName}}` : '/models/import';
                
                const response = await fetch(endpoint, {
--- a/core/services/agent_jobs.go
+++ b/core/services/agent_jobs.go
@@ -676,6 +676,7 @@ func (s *AgentJobService) executeJobInternal(job schema.Job, task schema.Task, c
 	job.Status = schema.JobStatusRunning
 	job.StartedAt = &now
 	s.jobs.Set(job.ID, job)
+	xlog.Info("Job started", "job_id", job.ID, "task_id", job.TaskID)

 	// Load model config
 	modelConfig, err := s.configLoader.LoadModelConfigFileByNameDefaultOptions(task.Model, s.appConfig)
@@ -980,6 +981,7 @@ func (s *AgentJobService) executeJobInternal(job schema.Job, task schema.Task, c
 	job.Result = f.LastMessage().Content
 	job.CompletedAt = &completedAt
 	s.jobs.Set(job.ID, job)
+	xlog.Info("Job completed", "job_id", job.ID, "status", job.Status)

 	// Save to file (async)
 	go func() {
--- a/docs/content/features/backends.md
+++ b/docs/content/features/backends.md
@@ -122,3 +122,4 @@ LocalAI supports various types of backends:
 - **Diffusion Backends**: For image generation
 - **TTS Backends**: For text-to-speech conversion
 - **Whisper Backends**: For speech-to-text conversion
+- **Sound Generation Backends**: For music and audio generation (e.g., ACE-Step)
--- a/docs/content/features/model-gallery.md
+++ b/docs/content/features/model-gallery.md
@@ -14,7 +14,7 @@ LocalAI to ease out installations of models provide a way to preload models on s


 {{% notice note %}}
-The models in this gallery are not directly maintained by LocalAI. If you find a model that is not working, please open an issue on the model gallery repository.
+The models in this gallery are not directly maintained by LocalAI. If you find a model that is not working, please open an issue on the [main LocalAI repository](https://github.com/mudler/LocalAI/issues).
 {{% /notice %}}

 {{% notice note %}}
--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
 {
-  "version": "v3.10.1"
+  "version": "v3.11.0"
 }
--- a/gallery/index.yaml
+++ b/gallery/index.yaml
@@ -1,4 +1,239 @@
 ---
+- name: nemo-parakeet-tdt-0.6b
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.
+  urls:
+    - https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
+    - https://github.com/NVIDIA/NeMo
+  tags:
+    - stt
+    - speech-to-text
+    - asr
+    - nvidia
+    - nemo
+    - parakeet
+    - cpu
+    - gpu
+  overrides:
+    backend: nemo
+    known_usecases:
+      - transcript
+    parameters:
+      model: nvidia/parakeet-tdt-0.6b-v3
+- name: voxtral-mini-4b-realtime
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.
+  urls:
+    - https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
+    - https://github.com/antirez/voxtral.c
+  tags:
+    - stt
+    - speech-to-text
+    - audio-transcription
+    - cpu
+    - metal
+    - mistral
+  overrides:
+    backend: voxtral
+    known_usecases:
+      - transcript
+    parameters:
+      model: voxtral-model
+  files:
+    - filename: voxtral-model/consolidated.safetensors
+      uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/consolidated.safetensors
+      sha256: 263f178fe752c90a2ae58f037a95ed092db8b14768b0978b8c48f66979c8345d
+    - filename: voxtral-model/params.json
+      uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/params.json
+    - filename: voxtral-model/tekken.json
+      uri: https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602/resolve/main/tekken.json
+      sha256: 8434af1d39eba99f0ef46cf1450bf1a63fa941a26933a1ef5dbbf4adf0d00e44
+- name: moonshine-tiny
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.
+  urls:
+    - https://github.com/moonshine-ai/moonshine
+  tags:
+    - stt
+    - speech-to-text
+    - asr
+    - audio-transcription
+    - cpu
+    - gpu
+  overrides:
+    backend: moonshine
+    known_usecases:
+      - transcript
+    parameters:
+      model: moonshine/tiny
+- name: whisperx-tiny
+  license: mit
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.
+  urls:
+    - https://github.com/m-bain/whisperX
+  tags:
+    - stt
+    - speech-to-text
+    - asr
+    - audio-transcription
+    - speaker-diarization
+    - cpu
+    - gpu
+  overrides:
+    backend: whisperx
+    known_usecases:
+      - transcript
+    parameters:
+      model: tiny
+- name: voxcpm-1.5
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    VoxCPM 1.5 is an end-to-end text-to-speech (TTS) model from ModelBest. It features zero-shot voice cloning and high-quality speech synthesis capabilities.
+  urls:
+    - https://huggingface.co/openbmb/VoxCPM1.5
+  tags:
+    - tts
+    - text-to-speech
+    - voice-cloning
+    - cpu
+    - gpu
+  overrides:
+    backend: voxcpm
+    known_usecases:
+      - tts
+    parameters:
+      model: openbmb/VoxCPM1.5
+- name: neutts-air
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.
+  urls:
+    - https://github.com/neuphonic/neutts-air
+  tags:
+    - tts
+    - text-to-speech
+    - voice-cloning
+    - cpu
+    - gpu
+  overrides:
+    backend: neutts
+    known_usecases:
+      - tts
+- name: vllm-omni-z-image-turbo
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Z-Image-Turbo via vLLM-Omni - A distilled version of Z-Image optimized for speed with only 8 NFEs. Offers sub-second inference latency on enterprise-grade H800 GPUs and fits within 16GB VRAM. Excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
+  urls:
+    - https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
+  tags:
+    - text-to-image
+    - image-generation
+    - vllm-omni
+    - z-image
+    - cpu
+    - gpu
+  overrides:
+    backend: vllm-omni
+    known_usecases:
+      - image_generation
+    parameters:
+      model: Tongyi-MAI/Z-Image-Turbo
+- name: vllm-omni-wan2.2-t2v
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Wan2.2-T2V-A14B via vLLM-Omni - Text-to-video generation model from Wan-AI. Generates high-quality videos from text prompts using a 14B parameter diffusion model.
+  urls:
+    - https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers
+  tags:
+    - text-to-video
+    - video-generation
+    - vllm-omni
+    - wan
+    - cpu
+    - gpu
+  overrides:
+    backend: vllm-omni
+    known_usecases:
+      - video_generation
+    parameters:
+      model: Wan-AI/Wan2.2-T2V-A14B-Diffusers
+- name: vllm-omni-wan2.2-i2v
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Wan2.2-I2V-A14B via vLLM-Omni - Image-to-video generation model from Wan-AI. Generates high-quality videos from images using a 14B parameter diffusion model.
+  urls:
+    - https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
+  tags:
+    - image-to-video
+    - video-generation
+    - vllm-omni
+    - wan
+    - cpu
+    - gpu
+  overrides:
+    backend: vllm-omni
+    known_usecases:
+      - video_generation
+    parameters:
+      model: Wan-AI/Wan2.2-I2V-A14B-Diffusers
+- name: vllm-omni-qwen3-omni-30b
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Qwen3-Omni-30B-A3B-Instruct via vLLM-Omni - A large multimodal model (30B active, 3B activated per token) from Alibaba Qwen team. Supports text, image, audio, and video understanding with text and speech output. Features native multimodal understanding across all modalities.
+  urls:
+    - https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
+  tags:
+    - llm
+    - multimodal
+    - vision
+    - audio
+    - video
+    - vllm-omni
+    - qwen3
+    - cpu
+    - gpu
+  overrides:
+    backend: vllm-omni
+    known_usecases:
+      - chat
+      - multimodal
+    parameters:
+      model: Qwen/Qwen3-Omni-30B-A3B-Instruct
+- name: vllm-omni-qwen3-tts-custom-voice
+  license: apache-2.0
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  description: |
+    Qwen3-TTS-12Hz-1.7B-CustomVoice via vLLM-Omni - Text-to-speech model from Alibaba Qwen team with custom voice cloning capabilities. Generates natural-sounding speech with voice personalization.
+  urls:
+    - https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
+  tags:
+    - tts
+    - text-to-speech
+    - voice-cloning
+    - vllm-omni
+    - qwen3
+    - cpu
+    - gpu
+  overrides:
+    backend: vllm-omni
+    known_usecases:
+      - tts
+    parameters:
+      model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
 - name: "ace-step-turbo"
  license: mit
  tags:
@@ -388,7 +623,7 @@
  files:
    - filename: llama-cpp/models/GLM-4.7-Flash-Q4_K_M.gguf
      uri: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF/resolve/main/GLM-4.7-Flash-Q4_K_M.gguf
-      sha256: 73ba18480e06ccda453a26263c0e2be2bd86294e827b1812ddea2f88bba2d924
+      sha256: 29837ed2c0fc5f51981adf8ac8083fcf80743c598381f13e9f06cbad0498b174
 - name: "qwen3-vl-reranker-8b"
  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
  urls:
@@ -12398,6 +12633,311 @@
    - filename: llama-cpp/mmproj/mmproj-mistral-community_pixtral-12b-f16.gguf
      sha256: a0b21e5a3b0f9b0b604385c45bb841142e7a5ac7660fa6a397dbc87c66b2083e
      uri: huggingface://bartowski/mistral-community_pixtral-12b-GGUF/mmproj-mistral-community_pixtral-12b-f16.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-14b-instruct-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512
+    - https://huggingface.co/unsloth/Ministral-3-14B-Instruct-2512-GGUF
+  description: |
+    The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 24GB of VRAM in FP8, and less if further quantized.
+
+    Key Features:
+    Ministral 3 14B consists of two main architectural components:
+
+        - 13.5B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 14B Instruct model offers the following capabilities:
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 16384
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-14B-Instruct-2512-Q4_K_M.gguf
+      temperature: 0.15
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-14B-Instruct-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-14B-Instruct-2512-Q4_K_M.gguf
+      sha256: 76ce697c065f2e40f1e8e958118b02cab38e2c10a6015f7d7908036a292dc8c8
+      uri: huggingface://unsloth/Ministral-3-14B-Instruct-2512-GGUF/Ministral-3-14B-Instruct-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-14B-Instruct-2512-f32.gguf
+      sha256: 2740ba9e9b30b09be4282a9a9f617ec43dc47b89aed416cb09b5f698f90783b5
+      uri: huggingface://unsloth/Ministral-3-14B-Instruct-2512-GGUF/mmproj-F32.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-14b-reasoning-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512
+    - https://huggingface.co/unsloth/Ministral-3-14B-Reasoning-2512-GGUF
+  description: |
+    The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
+
+    This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized.
+
+    Key Features:
+    Ministral 3 14B consists of two main architectural components:
+
+
+        - 13.5B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 14B Reasoning model offers the following capabilities:
+
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Reasoning: Excels at complex, multi-step reasoning and dynamic problem-solving.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 32768
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
+      temperature: 0.7
+      top_p: 0.95
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-14B-Reasoning-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
+      sha256: f577390559b89ebdbfe52cc234ea334649c24e6003ffa4b6a2474c5e2a47aa17
+      uri: huggingface://unsloth/Ministral-3-14B-Reasoning-2512-GGUF/Ministral-3-14B-Reasoning-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-14B-Reasoning-2512-f32.gguf
+      sha256: 891bf262a032968f6e5b3d4e9ffc84cf6381890033c2f5204fbdf4817af4ab9b
+      uri: huggingface://unsloth/Ministral-3-14B-Reasoning-2512-GGUF/mmproj-F32.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-8b-instruct-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512
+    - https://huggingface.co/unsloth/Ministral-3-8B-Instruct-2512-GGUF
+  description: |
+    A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 12GB of VRAM in FP8, and less if further quantized.
+
+    Key Features:
+    Ministral 3 8B consists of two main architectural components:
+
+        - 8.4B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 8B Instruct model offers the following capabilities:
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 16384
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
+      temperature: 0.15
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-8B-Instruct-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
+      sha256: 5dbc3647eb563b9f8d3c70ec3d906cce84b86bb35c5e0b8a36e7df3937ab7174
+      uri: huggingface://unsloth/Ministral-3-8B-Instruct-2512-GGUF/Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-8B-Instruct-2512-f32.gguf
+      sha256: 242d11ff65ef844b0aac4e28d4b1318813370608845f17b3ef5826fd7e7fd015
+      uri: huggingface://unsloth/Ministral-3-8B-Instruct-2512-GGUF/mmproj-F32.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-8b-reasoning-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512
+    - https://huggingface.co/unsloth/Ministral-3-8B-Reasoning-2512-GGUF
+  description: |
+    A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
+
+    This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.
+
+    Key Features:
+    Ministral 3 8B consists of two main architectural components:
+
+
+        - 8.4B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 8B Reasoning model offers the following capabilities:
+
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Reasoning: Excels at complex, multi-step reasoning and dynamic problem-solving.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 32768
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf
+      temperature: 0.7
+      top_p: 0.95
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-8B-Reasoning-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf
+      sha256: c3d1c5ab7406a0fc9d50ad2f0d15d34d5693db00bf953e8a9cd9a243b81cb1b2
+      uri: huggingface://unsloth/Ministral-3-8B-Reasoning-2512-GGUF/Ministral-3-8B-Reasoning-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-8B-Reasoning-2512-f32.gguf
+      sha256: 92252621cb957949379ff81ee14b15887d37eade3845a6e937e571b98c2c84c2
+      uri: huggingface://unsloth/Ministral-3-8B-Reasoning-2512-GGUF/mmproj-F32.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-3b-instruct-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512
+    - https://huggingface.co/unsloth/Ministral-3-3B-Instruct-2512-GGUF
+  description: |
+    The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 8GB of VRAM in FP8, and less if further quantized.
+
+    Key Features:
+    Ministral 3 3B consists of two main architectural components:
+
+        - 3.4B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 3B Instruct model offers the following capabilities:
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 16384
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-3B-Instruct-2512-Q4_K_M.gguf
+      temperature: 0.15
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-3B-Instruct-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-3B-Instruct-2512-Q4_K_M.gguf
+      sha256: fd46fc371ff0509bfa8657ac956b7de8534d7d9baaa4947975c0648c3aa397f4
+      uri: huggingface://unsloth/Ministral-3-3B-Instruct-2512-GGUF/Ministral-3-3B-Instruct-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-3B-Instruct-2512-f32.gguf
+      sha256: 57bb4e6f01166985ca2fc16061be4023fcb95cb8e60f445b8d0bf1ee30268636
+      uri: huggingface://unsloth/Ministral-3-3B-Instruct-2512-GGUF/mmproj-F32.gguf
+- !!merge <<: *mistral03
+  name: "mistralai_ministral-3-3b-reasoning-2512-multimodal"
+  urls:
+    - https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512
+    - https://huggingface.co/unsloth/Ministral-3-3B-Reasoning-2512-GGUF
+  description: |
+    The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
+
+    This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases.
+
+    The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
+
+    Key Features:
+    Ministral 3 3B consists of two main architectural components:
+
+        - 3.4B Language Model
+        - 0.4B Vision Encoder
+
+    The Ministral 3 3B Reasoning model offers the following capabilities:
+
+        - Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
+        - Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
+        - System Prompt: Maintains strong adherence and support for system prompts.
+        - Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
+        - Reasoning: Excels at complex, multi-step reasoning and dynamic problem-solving.
+        - Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
+        - Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
+        - Large Context Window: Supports a 256k context window.
+
+    This gallery entry includes mmproj for multimodality and uses Unsloth recommended defaults.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - mistral
+    - cpu
+    - function-calling
+    - multimodal
+  overrides:
+    context_size: 32768
+    parameters:
+      model: llama-cpp/models/mistralai_Ministral-3-3B-Reasoning-2512-Q4_K_M.gguf
+      temperature: 0.7
+      top_p: 0.95
+    mmproj: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-3B-Reasoning-2512-f32.gguf
+  files:
+    - filename: llama-cpp/models/mistralai_Ministral-3-3B-Reasoning-2512-Q4_K_M.gguf
+      sha256: a2648395d533b6d1408667d00e0b778f3823f3f3179ba371f89355f2e957e42e
+      uri: huggingface://unsloth/Ministral-3-3B-Reasoning-2512-GGUF/Ministral-3-3B-Reasoning-2512-Q4_K_M.gguf
+    - filename: llama-cpp/mmproj/mmproj-mistralai_Ministral-3-3B-Reasoning-2512-f32.gguf
+      sha256: 8035a6a10dfc6250f50c62764fae3ac2ef6d693fc9252307c7093198aabba812
+      uri: huggingface://unsloth/Ministral-3-3B-Reasoning-2512-GGUF/mmproj-F32.gguf
 - &mudler
  url: "github:mudler/LocalAI/gallery/mudler.yaml@master" ### START mudler's LocalAI specific-models
  name: "LocalAI-llama3-8b-function-call-v0.2"
@@ -12907,6 +13447,61 @@
    - filename: "phi-2-orange.Q4_0.gguf"
      sha256: "49cb710ae688e1b19b1b299087fa40765a0cd677e3afcc45e5f7ef6750975dcf"
      uri: "huggingface://TheBloke/phi-2-orange-GGUF/phi-2-orange.Q4_0.gguf"
+- url: "github:mudler/LocalAI/gallery/phi-3-chat.yaml@master"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/652feb6b4e527bd115ffd6c8/YFwodyNe6LmUrzQNmrl-D.png
+  license: mit
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - cpu
+    - phi-3
+  name: "npc-llm-3-8b"
+  urls:
+    - https://huggingface.co/Gigax/NPC-LLM-3_8B
+    - https://huggingface.co/bartowski/NPC-LLM-3_8B-GGUF
+  description: |
+    NPC model fined-tuned from Phi-3, using LoRA.
+
+    This model parses a text description of a game scene, and outputs commands like:
+
+        - say <player1> "Hello Adventurer, care to join me on a quest?
+        - greet <player1>
+        - attack <player1>
+        - Any other <action> <param> you add to the prompt! (We call these "skills"!)
+
+    ⚠️ This model has been trained to overfit on specific input prompt format. Follow it closely to reach optimal performance ⚠️
+
+    Input prompt
+
+    Here's a sample input prompt, showing you the format on which the model has been trained:
+
+        - WORLD KNOWLEDGE: A vast open world full of mystery and adventure.
+        - KNOWN LOCATIONS: Old Town
+        - NPCS: John the Brave
+        - CURRENT LOCATION: Old Town: A quiet and peaceful town.
+        - CURRENT LOCATION ITEMS: Sword
+        - LAST EVENTS:
+        Aldren: Say Sword What a fine sword!
+        - PROTAGONIST NAME: Aldren
+        - PROTAGONIST PSYCHOLOGICAL PROFILE: Brave and curious
+        - PROTAGONIST MEMORIES:
+        Saved the village
+        Lost a friend
+        - PROTAGONIST PENDING QUESTS:
+        Find the ancient artifact
+        Defeat the evil warlock
+        - PROTAGONIST ALLOWED ACTIONS:
+        Attack <character> : Deliver a powerful blow
+        Aldren:
+  overrides:
+    context_size: 4096
+    parameters:
+      model: NPC-LLM-3_8B-Q4_K_M.gguf
+  files:
+    - filename: NPC-LLM-3_8B-Q4_K_M.gguf
+      uri: huggingface://bartowski/NPC-LLM-3_8B-GGUF/NPC-LLM-3_8B-Q4_K_M.gguf
+      sha256: 5fcfb314566f0ae9364fe80237f96b12678aafbb8e82f90c6aece5ed2a6b83fd
 ### Internlm2
 - name: "internlm2_5-7b-chat-1m"
  url: "github:mudler/LocalAI/gallery/chatml.yaml@master"
--- a/go.mod
+++ b/go.mod
@@ -8,10 +8,11 @@ require (
 	dario.cat/mergo v1.0.2
 	fyne.io/fyne/v2 v2.7.2
 	github.com/Masterminds/sprig/v3 v3.3.0
-	github.com/alecthomas/kong v1.13.0
-	github.com/anthropics/anthropic-sdk-go v1.20.0
+	github.com/alecthomas/kong v1.14.0
+	github.com/anthropics/anthropic-sdk-go v1.22.0
 	github.com/charmbracelet/glamour v0.10.0
 	github.com/containerd/containerd v1.7.30
+	github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
 	github.com/ebitengine/purego v0.9.1
 	github.com/emirpasic/gods/v2 v2.0.0-alpha
 	github.com/fsnotify/fsnotify v1.9.0
@@ -23,7 +24,7 @@ require (
 	github.com/gpustack/gguf-parser-go v0.23.1
 	github.com/hpcloud/tail v1.0.0
 	github.com/ipfs/go-log v1.0.5
-	github.com/jaypipes/ghw v0.21.2
+	github.com/jaypipes/ghw v0.22.0
 	github.com/joho/godotenv v1.5.1
 	github.com/klauspost/cpuid/v2 v2.3.0
 	github.com/labstack/echo/v4 v4.15.0
@@ -37,9 +38,9 @@ require (
 	github.com/mudler/go-processmanager v0.1.0
 	github.com/mudler/memory v0.0.0-20251216220809-d1256471a6c2
 	github.com/mudler/xlog v0.0.5
-	github.com/onsi/ginkgo/v2 v2.28.0
+	github.com/onsi/ginkgo/v2 v2.28.1
 	github.com/onsi/gomega v1.39.1
-	github.com/openai/openai-go/v3 v3.17.0
+	github.com/openai/openai-go/v3 v3.19.0
 	github.com/otiai10/copy v1.14.1
 	github.com/otiai10/openaigo v1.7.0
 	github.com/phayes/freeport v0.0.0-20220201140144-74d24b5ae9f5
@@ -66,7 +67,6 @@ require (
 )

 require (
-	github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8 // indirect
 	github.com/ghodss/yaml v1.0.0 // indirect
 	github.com/labstack/gommon v0.4.2 // indirect
 	github.com/swaggo/files/v2 v2.0.2 // indirect
--- a/go.sum
+++ b/go.sum
@@ -36,16 +36,16 @@ github.com/alecthomas/assert/v2 v2.11.0 h1:2Q9r3ki8+JYXvGsDyBXwH3LcJ+WK5D0gc5E8v
 github.com/alecthomas/assert/v2 v2.11.0/go.mod h1:Bze95FyfUr7x34QZrjL+XP+0qgp/zg8yS+TtBj1WA3k=
 github.com/alecthomas/chroma/v2 v2.14.0 h1:R3+wzpnUArGcQz7fCETQBzO5n9IMNi13iIs46aU4V9E=
 github.com/alecthomas/chroma/v2 v2.14.0/go.mod h1:QolEbTfmUHIMVpBqxeDnNBj2uoeI4EbYP4i6n68SG4I=
-github.com/alecthomas/kong v1.13.0 h1:5e/7XC3ugvhP1DQBmTS+WuHtCbcv44hsohMgcvVxSrA=
-github.com/alecthomas/kong v1.13.0/go.mod h1:wrlbXem1CWqUV5Vbmss5ISYhsVPkBb1Yo7YKJghju2I=
+github.com/alecthomas/kong v1.14.0 h1:gFgEUZWu2ZmZ+UhyZ1bDhuutbKN1nTtJTwh19Wsn21s=
+github.com/alecthomas/kong v1.14.0/go.mod h1:wrlbXem1CWqUV5Vbmss5ISYhsVPkBb1Yo7YKJghju2I=
 github.com/alecthomas/repr v0.5.2 h1:SU73FTI9D1P5UNtvseffFSGmdNci/O6RsqzeXJtP0Qs=
 github.com/alecthomas/repr v0.5.2/go.mod h1:Fr0507jx4eOXV7AlPV6AVZLYrLIuIeSOWtW57eE/O/4=
 github.com/andybalholm/brotli v1.0.1/go.mod h1:loMXtMfwqflxFJPmdbJO0a3KNoPuLBgiu3qAvBg8x/Y=
 github.com/andybalholm/brotli v1.2.0 h1:ukwgCxwYrmACq68yiUqwIWnGY0cTPox/M94sVwToPjQ=
 github.com/andybalholm/brotli v1.2.0/go.mod h1:rzTDkvFWvIrjDXZHkuS16NPggd91W3kUSvPlQ1pLaKY=
 github.com/anmitsu/go-shlex v0.0.0-20161002113705-648efa622239/go.mod h1:2FmKhYUyUczH0OGQWaF5ceTx0UBShxjsH6f8oGKYe2c=
-github.com/anthropics/anthropic-sdk-go v1.20.0 h1:KE6gQiAT1aBHMh3Dmp1WgqnyZZLJNo2oX3ka004oDLE=
-github.com/anthropics/anthropic-sdk-go v1.20.0/go.mod h1:WTz31rIUHUHqai2UslPpw5CwXrQP3geYBioRV4WOLvE=
+github.com/anthropics/anthropic-sdk-go v1.22.0 h1:sgo4Ob5pC5InKCi/5Ukn5t9EjPJ7KTMaKm5beOYt6rM=
+github.com/anthropics/anthropic-sdk-go v1.22.0/go.mod h1:WTz31rIUHUHqai2UslPpw5CwXrQP3geYBioRV4WOLvE=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
 github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -342,8 +342,8 @@ github.com/ipld/go-ipld-prime v0.21.0 h1:n4JmcpOlPDIxBcY037SVfpd1G+Sj1nKZah0m6QH
 github.com/ipld/go-ipld-prime v0.21.0/go.mod h1:3RLqy//ERg/y5oShXXdx5YIp50cFGOanyMctpPjsvxQ=
 github.com/jackpal/go-nat-pmp v1.0.2 h1:KzKSgb7qkJvOUTqYl9/Hg/me3pWgBmERKrTGD7BdWus=
 github.com/jackpal/go-nat-pmp v1.0.2/go.mod h1:QPH045xvCAeXUZOxsnwmrtiCoxIr9eob+4orBN1SBKc=
-github.com/jaypipes/ghw v0.21.2 h1:woW0lqNMPbYk59sur6thOVM8YFP9Hxxr8PM+JtpUrNU=
-github.com/jaypipes/ghw v0.21.2/go.mod h1:GPrvwbtPoxYUenr74+nAnWbardIZq600vJDD5HnPsPE=
+github.com/jaypipes/ghw v0.22.0 h1:v3G5E1Q7UO61xV15lls5a+2jkQNjM3Z3fE+KOWRt1j4=
+github.com/jaypipes/ghw v0.22.0/go.mod h1:fUNUjMZ0cjahKo+/u+32m9FutIx53Nkbi0Ti0m7j5HY=
 github.com/jaypipes/pcidb v1.1.1 h1:QmPhpsbmmnCwZmHeYAATxEaoRuiMAJusKYkUncMC0ro=
 github.com/jaypipes/pcidb v1.1.1/go.mod h1:x27LT2krrUgjf875KxQXKB0Ha/YXLdZRVmw6hH0G7g8=
 github.com/jbenet/go-temp-err-catcher v0.1.0 h1:zpb3ZH6wIE8Shj2sKS+khgRvf7T7RABoLk/+KKHggpk=
@@ -563,12 +563,12 @@ github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=
 github.com/nxadm/tail v1.4.8/go.mod h1:+ncqLTQzXmGhMZNUePPaPqPvBxHAIsmXswZKocGu+AU=
 github.com/onsi/ginkgo v1.16.5 h1:8xi0RTUf59SOSfEtZMvwTvXYMzG4gV23XVHOZiXNtnE=
 github.com/onsi/ginkgo v1.16.5/go.mod h1:+E8gABHa3K6zRBolWtd+ROzc/U5bkGt0FwiG042wbpU=
-github.com/onsi/ginkgo/v2 v2.28.0 h1:Rrf+lVLmtlBIKv6KrIGJCjyY8N36vDVcutbGJkyqjJc=
-github.com/onsi/ginkgo/v2 v2.28.0/go.mod h1:ArE1D/XhNXBXCBkKOLkbsb2c81dQHCRcF5zwn/ykDRo=
+github.com/onsi/ginkgo/v2 v2.28.1 h1:S4hj+HbZp40fNKuLUQOYLDgZLwNUVn19N3Atb98NCyI=
+github.com/onsi/ginkgo/v2 v2.28.1/go.mod h1:CLtbVInNckU3/+gC8LzkGUb9oF+e8W8TdUsxPwvdOgE=
 github.com/onsi/gomega v1.39.1 h1:1IJLAad4zjPn2PsnhH70V4DKRFlrCzGBNrNaru+Vf28=
 github.com/onsi/gomega v1.39.1/go.mod h1:hL6yVALoTOxeWudERyfppUcZXjMwIMLnuSfruD2lcfg=
-github.com/openai/openai-go/v3 v3.17.0 h1:CfTkmQoItolSyW+bHOUF190KuX5+1Zv6MC0Gb4wAwy8=
-github.com/openai/openai-go/v3 v3.17.0/go.mod h1:cdufnVK14cWcT9qA1rRtrXx4FTRsgbDPW7Ia7SS5cZo=
+github.com/openai/openai-go/v3 v3.19.0 h1:xS/UQeSaNuL4bZjq28/rBrA4OZaq1BcYLBwQm9Vx8cI=
+github.com/openai/openai-go/v3 v3.19.0/go.mod h1:cdufnVK14cWcT9qA1rRtrXx4FTRsgbDPW7Ia7SS5cZo=
 github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
 github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
 github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
--- a/pkg/utils/base64.go
+++ b/pkg/utils/base64.go
@@ -21,6 +21,10 @@ var dataURIPattern = regexp.MustCompile(`^data:([^;]+);base64,`)
 // GetContentURIAsBase64 checks if the string is an URL, if it's an URL downloads the content in memory encodes it in base64 and returns the base64 string, otherwise returns the string by stripping base64 data headers
 func GetContentURIAsBase64(s string) (string, error) {
 	if strings.HasPrefix(s, "http") || strings.HasPrefix(s, "https") {
+		if err := ValidateExternalURL(s); err != nil {
+			return "", fmt.Errorf("URL validation failed: %w", err)
+		}
+
 		// download the image
 		resp, err := base64DownloadClient.Get(s)
 		if err != nil {
--- a/pkg/utils/urlfetch.go
+++ b/pkg/utils/urlfetch.go
@@ -0,0 +1,78 @@
+package utils
+
+import (
+	"fmt"
+	"net"
+	"net/url"
+	"strings"
+)
+
+// ValidateExternalURL checks that the given URL does not point to a private,
+// loopback, link-local, or otherwise internal network address. This prevents
+// Server-Side Request Forgery (SSRF) attacks where a user-supplied URL could
+// be used to probe internal services or cloud metadata endpoints.
+func ValidateExternalURL(rawURL string) error {
+	parsed, err := url.Parse(rawURL)
+	if err != nil {
+		return fmt.Errorf("invalid URL: %w", err)
+	}
+
+	scheme := strings.ToLower(parsed.Scheme)
+	if scheme != "http" && scheme != "https" {
+		return fmt.Errorf("unsupported URL scheme: %s", scheme)
+	}
+
+	hostname := parsed.Hostname()
+	if hostname == "" {
+		return fmt.Errorf("URL has no hostname")
+	}
+
+	// Block well-known internal hostnames
+	lower := strings.ToLower(hostname)
+	if lower == "localhost" || strings.HasSuffix(lower, ".local") {
+		return fmt.Errorf("requests to internal hosts are not allowed")
+	}
+
+	// Block cloud metadata service hostnames
+	if lower == "metadata.google.internal" || lower == "instance-data" {
+		return fmt.Errorf("requests to cloud metadata services are not allowed")
+	}
+
+	ips, err := net.LookupHost(hostname)
+	if err != nil {
+		return fmt.Errorf("failed to resolve hostname: %w", err)
+	}
+
+	for _, ipStr := range ips {
+		ip := net.ParseIP(ipStr)
+		if ip == nil {
+			return fmt.Errorf("unable to parse resolved IP: %s", ipStr)
+		}
+
+		if !isPublicIP(ip) {
+			return fmt.Errorf("requests to internal network addresses are not allowed")
+		}
+	}
+
+	return nil
+}
+
+func isPublicIP(ip net.IP) bool {
+	if ip.IsLoopback() ||
+		ip.IsLinkLocalUnicast() ||
+		ip.IsLinkLocalMulticast() ||
+		ip.IsPrivate() ||
+		ip.IsUnspecified() {
+		return false
+	}
+
+	// Block IPv4-mapped IPv6 addresses that wrap private IPv4
+	if ip4 := ip.To4(); ip4 != nil {
+		return !ip4.IsLoopback() &&
+			!ip4.IsLinkLocalUnicast() &&
+			!ip4.IsPrivate() &&
+			!ip4.IsUnspecified()
+	}
+
+	return true
+}
--- a/pkg/utils/urlfetch_test.go
+++ b/pkg/utils/urlfetch_test.go
@@ -0,0 +1,99 @@
+package utils_test
+
+import (
+	. "github.com/mudler/LocalAI/pkg/utils"
+	. "github.com/onsi/ginkgo/v2"
+	. "github.com/onsi/gomega"
+)
+
+var _ = Describe("utils/urlfetch tests", func() {
+	Context("ValidateExternalURL", func() {
+		It("allows valid external HTTPS URLs", func() {
+			err := ValidateExternalURL("https://example.com/image.png")
+			Expect(err).To(BeNil())
+		})
+
+		It("allows valid external HTTP URLs", func() {
+			err := ValidateExternalURL("http://example.com/image.png")
+			Expect(err).To(BeNil())
+		})
+
+		It("blocks localhost", func() {
+			err := ValidateExternalURL("http://localhost/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks 127.0.0.1", func() {
+			err := ValidateExternalURL("http://127.0.0.1/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks private 10.x.x.x range", func() {
+			err := ValidateExternalURL("http://10.0.0.1/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks private 172.16.x.x range", func() {
+			err := ValidateExternalURL("http://172.16.0.1/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks private 192.168.x.x range", func() {
+			err := ValidateExternalURL("http://192.168.1.1/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks link-local 169.254.x.x (AWS metadata)", func() {
+			err := ValidateExternalURL("http://169.254.169.254/latest/meta-data/")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks unsupported schemes", func() {
+			err := ValidateExternalURL("ftp://example.com/file")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("unsupported URL scheme"))
+		})
+
+		It("blocks file:// scheme", func() {
+			err := ValidateExternalURL("file:///etc/passwd")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("unsupported URL scheme"))
+		})
+
+		It("blocks URLs with no hostname", func() {
+			err := ValidateExternalURL("http:///path")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("no hostname"))
+		})
+
+		It("blocks .local hostnames", func() {
+			err := ValidateExternalURL("http://myservice.local/api")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks metadata.google.internal", func() {
+			err := ValidateExternalURL("http://metadata.google.internal/computeMetadata/v1/")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("metadata"))
+		})
+
+		It("blocks 0.0.0.0", func() {
+			err := ValidateExternalURL("http://0.0.0.0/")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+
+		It("blocks IPv6 loopback ::1", func() {
+			err := ValidateExternalURL("http://[::1]/secret")
+			Expect(err).ToNot(BeNil())
+			Expect(err.Error()).To(ContainSubstring("internal"))
+		})
+	})
+})
Author	SHA1	Message	Date
Ettore Di Giacinto	1f0110368d	step-flash fixes Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 23:36:16 +01:00
Ettore Di Giacinto	2fd026e958	fix: update moonshine API, add setuptools to voxcpm requirements (#8541 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 23:22:37 +01:00
LocalAI [bot]	08718b656e	chore: ⬆️ Update ggml-org/llama.cpp to `338085c69e486b7155e5b03d7b5087e02c0e2528` (#8538 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-12 23:21:53 +01:00
LocalAI [bot]	7121b189f7	chore(model-gallery): ⬆️ update checksum (#8540 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-12 21:54:33 +01:00
Richard Palethorpe	f6c80a6987	feat(realtime): Allow sending text, image and audio conversation items" (#8524 ) feat(realtime): Allow sending text and image conversation items Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-12 19:33:46 +00:00
Ettore Di Giacinto	4a4d65f8e8	chore(model gallery): add vllm-omni models (#8536 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 18:27:20 +01:00
Ettore Di Giacinto	2858e71606	chore(model gallery): add neutts (#8535 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 18:17:03 +01:00
Ettore Di Giacinto	088205339c	chore(model gallery): add voxcpm, whisperx, moonshine-tiny (#8534 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 18:13:03 +01:00
Ettore Di Giacinto	8616397d59	chore(model gallery): add nemo-asr (#8533 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 18:01:42 +01:00
Ettore Di Giacinto	1698f92bd0	Remove URL entry from gallery index Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-12 17:50:13 +01:00
Ettore Di Giacinto	02c95a57ae	Add known use cases for audio processing Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-12 17:49:54 +01:00
rampa3	2ab6be1d0c	chore(model gallery): Add npc-llm-3-8b (#8498 ) Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>	2026-02-12 17:46:25 +01:00
Ettore Di Giacinto	9d78ec1bd8	chore(model gallery): add voxtral (which is only available in development) (#8532 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-12 17:44:24 +01:00
LocalAI [bot]	b10b85de52	chore: improve log levels verbosity (#8528 ) * chore: init for PR * feat: improve log verbosity per #8449 - demote /api/resources to DEBUG, elevate job events to INFO --------- Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-12 16:24:46 +01:00
Richard Palethorpe	1479bee894	fix(realtime): Sampling and websocket locking (#8521 ) * fix(realtime): Use locked websocket for concurrent access Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(realtime): Use sample rate set in session Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(config): Allow pipelines to have no model parameters Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-02-12 13:57:34 +01:00
Austen	cff972094c	feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504 ) * add experimental support for sd_embed-style prompt embedding Signed-off-by: Austen Dicken <cvpcsm@gmail.com> * add doc equivalent to compel Signed-off-by: Austen Dicken <cvpcsm@gmail.com> * need to use flux1 embedding function for flux model Signed-off-by: Austen Dicken <cvpcsm@gmail.com> --------- Signed-off-by: Austen Dicken <cvpcsm@gmail.com>	2026-02-11 22:58:19 +01:00
LocalAI [bot]	79a25f7ae9	chore: ⬆️ Update ggml-org/llama.cpp to `4d3daf80f8834e0eb5148efc7610513f1e263653` (#8513 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-11 21:48:39 +00:00
Richard Palethorpe	7270a98ce5	fix(realtime): Use user provided voice and allow pipeline models to have no backend (#8415 ) * fix(realtime): Use the voice provided by the user or none at all Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ui,config): Allow pipeline models to have no backend and use same validation in frontend Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-11 14:18:05 +01:00
LocalAI [bot]	0ee92317ec	chore: ⬆️ Update ggml-org/llama.cpp to `57487a64c88c152ac72f3aea09bd1cc491b2f61e` (#8499 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-10 21:32:46 +00:00
LocalAI [bot]	743d2d1947	chore: ⬆️ Update ggml-org/whisper.cpp to `764482c3175d9c3bc6089c1ec84df7d1b9537d83` (#8478 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-10 15:14:59 +01:00
LocalAI [bot]	df04843f34	chore: ⬆️ Update ggml-org/llama.cpp to `262364e31d1da43596fe84244fba44e94a0de64e` (#8479 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-10 15:14:33 +01:00
Kolega.dev	780877d1d0	security: validate URLs to prevent SSRF in content fetching endpoints (#8476 ) User-supplied URLs passed to GetContentURIAsBase64() and downloadFile() were fetched without validation, allowing SSRF attacks against internal services. Added URL validation that blocks private IPs, loopback, link-local, and cloud metadata endpoints before fetching. Co-authored-by: kolega.dev <faizan@kolega.ai>	2026-02-10 15:14:14 +01:00
dependabot[bot]	08eeed61f4	chore(deps): bump github.com/openai/openai-go/v3 from 3.17.0 to 3.19.0 (#8485 ) Bumps [github.com/openai/openai-go/v3](https://github.com/openai/openai-go) from 3.17.0 to 3.19.0. - [Release notes](https://github.com/openai/openai-go/releases) - [Changelog](https://github.com/openai/openai-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/openai/openai-go/compare/v3.17.0...v3.19.0) --- updated-dependencies: - dependency-name: github.com/openai/openai-go/v3 dependency-version: 3.19.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-10 05:41:15 +00:00
dependabot[bot]	5207ff84dc	chore(deps): bump github.com/alecthomas/kong from 1.13.0 to 1.14.0 (#8481 ) Bumps [github.com/alecthomas/kong](https://github.com/alecthomas/kong) from 1.13.0 to 1.14.0. - [Commits](https://github.com/alecthomas/kong/compare/v1.13.0...v1.14.0) --- updated-dependencies: - dependency-name: github.com/alecthomas/kong dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-10 04:29:00 +00:00
dependabot[bot]	4ade2e61ab	chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.0 to 2.28.1 (#8483 ) Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.28.0 to 2.28.1. - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](https://github.com/onsi/ginkgo/compare/v2.28.0...v2.28.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-10 03:15:46 +00:00
dependabot[bot]	818be98314	chore(deps): bump github.com/jaypipes/ghw from 0.21.2 to 0.22.0 (#8484 ) Bumps [github.com/jaypipes/ghw](https://github.com/jaypipes/ghw) from 0.21.2 to 0.22.0. - [Release notes](https://github.com/jaypipes/ghw/releases) - [Commits](https://github.com/jaypipes/ghw/compare/v0.21.2...v0.22.0) --- updated-dependencies: - dependency-name: github.com/jaypipes/ghw dependency-version: 0.22.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-10 02:02:38 +00:00
dependabot[bot]	056c438452	chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.20.0 to 1.22.0 (#8482 ) chore(deps): bump github.com/anthropics/anthropic-sdk-go Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.20.0 to 1.22.0. - [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases) - [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.20.0...v1.22.0) --- updated-dependencies: - dependency-name: github.com/anthropics/anthropic-sdk-go dependency-version: 1.22.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-09 23:34:36 +00:00
LocalAI [bot]	0c040beb59	chore: ⬆️ Update antirez/voxtral.c to `c9e8773a2042d67c637fc492c8a655c485354080` (#8477 ) ⬆️ Update antirez/voxtral.c Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-09 22:20:03 +01:00
Ettore Di Giacinto	bf5a1dd840	feat(voxtral): add voxtral backend (#8451 ) * feat(voxtral): add voxtral backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-09 09:12:05 +01:00
rampa3	f44200bec8	chore(model gallery): Add Ministral 3 family of models (aside from base versions) (#8467 ) Signed-off-by: rampa3 <68955305+rampa3@users.noreply.github.com>	2026-02-09 09:10:37 +01:00
LocalAI [bot]	3b1b08efd6	chore: ⬆️ Update ggml-org/llama.cpp to `e06088da0fa86aa444409f38dff274904931c507` (#8464 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-09 09:09:32 +01:00
LocalAI [bot]	3d8791067f	chore: ⬆️ Update ggml-org/whisper.cpp to `4b23ff249e7f93137cb870b28fb27818e074c255` (#8463 ) ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-09 09:08:55 +01:00
Austen	da8207b73b	feat(stablediffusion-ggml): Improve legacy CPU support for stablediffusion-ggml backend (#8461 ) * Port AVX logic from whisper to stablediffusion-ggml Signed-off-by: Austen Dicken <cvpcsm@gmail.com> * disable BMI2 on AVX builds Signed-off-by: Austen Dicken <cvpcsm@gmail.com> --------- Signed-off-by: Austen Dicken <cvpcsm@gmail.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-08 23:11:33 +00:00
Varun Chawla	aa9ca401fa	docs: update model gallery documentation to reference main repository (#8452 ) Fixes #8212 - Updated the note about reporting broken models to reference the main LocalAI repository instead of the outdated separate gallery repository reference.	2026-02-08 22:14:23 +01:00
LocalAI [bot]	e43c0c3ffc	docs: ⬆️ update docs version mudler/LocalAI (#8462 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-08 21:12:50 +00:00