fix: this backend is CUDA only

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
feat(backends): add faster-qwen3-tts
2026-05-30 03:25:42 -04:00 · 2026-02-26 23:08:24 +00:00 · 2026-02-26 23:00:59 +00:00 · 2026-02-26 23:18:53 +01:00 · 2026-02-26 23:17:33 +01:00 · 2026-02-26 21:34:47 +00:00
62 changed files with 1144 additions and 1556 deletions
--- a/.devcontainer/docker-compose-devcontainer.yml
+++ b/.devcontainer/docker-compose-devcontainer.yml
@@ -10,7 +10,8 @@ services:
      - 8080:8080
    volumes:
      - localai_workspace:/workspace
-      - ../models:/host-models
+      - models:/host-models
+      - backends:/host-backends
      - ./customization:/devcontainer-customization
    command: /bin/sh -c "while sleep 1000; do :; done"
    cap_add:
@@ -39,6 +40,9 @@ services:
      - GF_SECURITY_ADMIN_PASSWORD=grafana
    volumes:
      - ./grafana:/etc/grafana/provisioning/datasources
+
 volumes:
  prom_data:
-  localai_workspace:
+  localai_workspace:
+  models:
+  backends:
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -210,6 +210,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "8"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-12-faster-qwen3-tts'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "faster-qwen3-tts"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'cublas'
            cuda-major-version: "12"
            cuda-minor-version: "8"
@@ -575,6 +588,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2404'
+          - build-type: 'cublas'
+            cuda-major-version: "13"
+            cuda-minor-version: "0"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-gpu-nvidia-cuda-13-faster-qwen3-tts'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            backend: "faster-qwen3-tts"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2404'
          - build-type: 'cublas'
            cuda-major-version: "13"
            cuda-minor-version: "0"
@@ -705,6 +731,19 @@ jobs:
            backend: "qwen-tts"
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
+          - build-type: 'l4t'
+            cuda-major-version: "13"
+            cuda-minor-version: "0"
+            platforms: 'linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            ubuntu-version: '2404'
+            backend: "faster-qwen3-tts"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
          - build-type: 'l4t'
            cuda-major-version: "13"
            cuda-minor-version: "0"
@@ -718,6 +757,19 @@ jobs:
            backend: "pocket-tts"
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
+          - build-type: 'l4t'
+            cuda-major-version: "13"
+            cuda-minor-version: "0"
+            platforms: 'linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-nvidia-l4t-cuda-13-arm64-chatterbox'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: "ubuntu:24.04"
+            skip-drivers: 'false'
+            ubuntu-version: '2404'
+            backend: "chatterbox"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
          - build-type: 'l4t'
            cuda-major-version: "13"
            cuda-minor-version: "0"
@@ -1293,6 +1345,19 @@ jobs:
            dockerfile: "./backend/Dockerfile.python"
            context: "./"
            ubuntu-version: '2204'
+          - build-type: 'l4t'
+            cuda-major-version: "12"
+            cuda-minor-version: "0"
+            platforms: 'linux/arm64'
+            tag-latest: 'auto'
+            tag-suffix: '-nvidia-l4t-faster-qwen3-tts'
+            runs-on: 'ubuntu-24.04-arm'
+            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+            skip-drivers: 'true'
+            backend: "faster-qwen3-tts"
+            dockerfile: "./backend/Dockerfile.python"
+            context: "./"
+            ubuntu-version: '2204'
          - build-type: 'l4t'
            cuda-major-version: "12"
            cuda-minor-version: "0"
@@ -1892,7 +1957,7 @@ jobs:
          - build-type: ''
            cuda-major-version: ""
            cuda-minor-version: ""
-            platforms: 'linux/amd64'
+            platforms: 'linux/amd64,linux/arm64'
            tag-latest: 'auto'
            tag-suffix: '-cpu-voxcpm'
            runs-on: 'ubuntu-latest'
--- a/.github/workflows/localaibot_automerge.yml
+++ b/.github/workflows/localaibot_automerge.yml
@@ -10,7 +10,7 @@ permissions:
  actions: write # to dispatch publish workflow
 jobs:
  dependabot:
-    if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && !contains(github.event.pull_request.title, 'chore(model gallery):')
+    if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && contains(github.event.pull_request.title, 'chore:')
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -18,7 +18,7 @@ jobs:
        with:
          go-version: 1.23
      - name: Run GoReleaser
-        uses: goreleaser/goreleaser-action@v6
+        uses: goreleaser/goreleaser-action@v7
        with:
          version: v2.11.0
          args: release --clean
--- a/.github/workflows/stalebot.yml
+++ b/.github/workflows/stalebot.yml
@@ -11,7 +11,7 @@ jobs:
    if: github.repository == 'mudler/LocalAI'
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/stale@997185467fa4f803885201cee163a9f38240193d # v9
+      - uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v9
        with:
          stale-issue-message: 'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
          stale-pr-message: 'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
--- a/8
+++ b/8
@@ -1,5 +1,5 @@
 # Disable parallel execution for backend builds
-.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral
+.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral

 GOCMD=go
 GOTEST=$(GOCMD) test
@@ -317,6 +317,7 @@ prepare-test-extra: protogen-python
 	$(MAKE) -C backend/python/moonshine
 	$(MAKE) -C backend/python/pocket-tts
 	$(MAKE) -C backend/python/qwen-tts
+	$(MAKE) -C backend/python/faster-qwen3-tts
 	$(MAKE) -C backend/python/qwen-asr
 	$(MAKE) -C backend/python/nemo
 	$(MAKE) -C backend/python/voxcpm
@@ -334,6 +335,7 @@ test-extra: prepare-test-extra
 	$(MAKE) -C backend/python/moonshine test
 	$(MAKE) -C backend/python/pocket-tts test
 	$(MAKE) -C backend/python/qwen-tts test
+	$(MAKE) -C backend/python/faster-qwen3-tts test
 	$(MAKE) -C backend/python/qwen-asr test
 	$(MAKE) -C backend/python/nemo test
 	$(MAKE) -C backend/python/voxcpm test
@@ -473,6 +475,7 @@ BACKEND_VIBEVOICE = vibevoice|python|.|--progress=plain|true
 BACKEND_MOONSHINE = moonshine|python|.|false|true
 BACKEND_POCKET_TTS = pocket-tts|python|.|false|true
 BACKEND_QWEN_TTS = qwen-tts|python|.|false|true
+BACKEND_FASTER_QWEN3_TTS = faster-qwen3-tts|python|.|false|true
 BACKEND_QWEN_ASR = qwen-asr|python|.|false|true
 BACKEND_NEMO = nemo|python|.|false|true
 BACKEND_VOXCPM = voxcpm|python|.|false|true
@@ -525,6 +528,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_VIBEVOICE)))
 $(eval $(call generate-docker-build-target,$(BACKEND_MOONSHINE)))
 $(eval $(call generate-docker-build-target,$(BACKEND_POCKET_TTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_QWEN_TTS)))
+$(eval $(call generate-docker-build-target,$(BACKEND_FASTER_QWEN3_TTS)))
 $(eval $(call generate-docker-build-target,$(BACKEND_QWEN_ASR)))
 $(eval $(call generate-docker-build-target,$(BACKEND_NEMO)))
 $(eval $(call generate-docker-build-target,$(BACKEND_VOXCPM)))
@@ -535,7 +539,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
 docker-save-%: backend-images
 	docker save local-ai-backend:$* -o backend-images/$*.tar

-docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral
+docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral

 ########################################################
 ### Mock Backend for E2E Tests
--- a/README.md
+++ b/README.md
@@ -93,16 +93,7 @@ Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tool

 ## 💻 Quickstart

-> ⚠️ **Note:** The `install.sh` script is currently experiencing issues due to the heavy changes currently undergoing in LocalAI and may produce broken or misconfigured installations. Please use Docker installation (see below) or manual binary installation until [issue #8032](https://github.com/mudler/LocalAI/issues/8032) is resolved.

-Run the installer script:
-
-```bash
-# Basic installation
-curl https://localai.io/install.sh | sh
-```
-
-For more installation options, see [Installer Options](https://localai.io/installation/).

 ### macOS Download:

--- a/backend/cpp/llama-cpp/Makefile
+++ b/backend/cpp/llama-cpp/Makefile
@@ -1,5 +1,5 @@

-LLAMA_VERSION?=ba3b9c8844aca35ecb40d31886686326f22d2214
+LLAMA_VERSION?=723c71064da0908c19683f8c344715fbf6d986fd
 LLAMA_REPO?=https://github.com/ggerganov/llama.cpp

 CMAKE_ARGS?=
--- a/backend/index.yaml
+++ b/backend/index.yaml
@@ -528,6 +528,28 @@
    nvidia-l4t-cuda-12: "nvidia-l4t-qwen-tts"
    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-qwen-tts"
  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
+- &faster-qwen3-tts
+  urls:
+    - https://github.com/andimarafioti/faster-qwen3-tts
+    - https://pypi.org/project/faster-qwen3-tts/
+  description: |
+    Real-time Qwen3-TTS inference using CUDA graph capture. Voice clone only; requires NVIDIA GPU with CUDA.
+  tags:
+    - text-to-speech
+    - TTS
+    - voice-clone
+  license: apache-2.0
+  name: "faster-qwen3-tts"
+  alias: "faster-qwen3-tts"
+  capabilities:
+    nvidia: "cuda12-faster-qwen3-tts"
+    default: "cuda12-faster-qwen3-tts"
+    nvidia-cuda-13: "cuda13-faster-qwen3-tts"
+    nvidia-cuda-12: "cuda12-faster-qwen3-tts"
+    nvidia-l4t: "nvidia-l4t-faster-qwen3-tts"
+    nvidia-l4t-cuda-12: "nvidia-l4t-faster-qwen3-tts"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-faster-qwen3-tts"
+  icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
 - &qwen-asr
  urls:
    - https://github.com/QwenLM/Qwen3-ASR
@@ -2030,7 +2052,7 @@
    nvidia-cuda-13: "cuda13-chatterbox-development"
    nvidia-cuda-12: "cuda12-chatterbox-development"
    nvidia-l4t-cuda-12: "nvidia-l4t-arm64-chatterbox"
-    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-chatterbox"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-chatterbox-development"
 - !!merge <<: *chatterbox
  name: "cpu-chatterbox"
  uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-chatterbox"
@@ -2279,6 +2301,57 @@
  uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-qwen-tts"
  mirrors:
    - localai/localai-backends:master-metal-darwin-arm64-qwen-tts
+## faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "faster-qwen3-tts-development"
+  capabilities:
+    nvidia: "cuda12-faster-qwen3-tts-development"
+    default: "cuda12-faster-qwen3-tts-development"
+    nvidia-cuda-13: "cuda13-faster-qwen3-tts-development"
+    nvidia-cuda-12: "cuda12-faster-qwen3-tts-development"
+    nvidia-l4t: "nvidia-l4t-faster-qwen3-tts-development"
+    nvidia-l4t-cuda-12: "nvidia-l4t-faster-qwen3-tts-development"
+    nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-faster-qwen3-tts-development"
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda12-faster-qwen3-tts"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-12-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda12-faster-qwen3-tts-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-12-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda13-faster-qwen3-tts"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:latest-gpu-nvidia-cuda-13-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda13-faster-qwen3-tts-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:master-gpu-nvidia-cuda-13-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "nvidia-l4t-faster-qwen3-tts"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "nvidia-l4t-faster-qwen3-tts-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda13-nvidia-l4t-arm64-faster-qwen3-tts"
+  uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts
+- !!merge <<: *faster-qwen3-tts
+  name: "cuda13-nvidia-l4t-arm64-faster-qwen3-tts-development"
+  uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts"
+  mirrors:
+    - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts
 ## qwen-asr
 - !!merge <<: *qwen-asr
  name: "qwen-asr-development"
--- a/backend/python/common/template/requirements.txt
+++ b/backend/python/common/template/requirements.txt
@@ -1,3 +1,3 @@
-grpcio==1.76.0
+grpcio==1.78.1
 protobuf
 grpcio-tools
--- a/backend/python/coqui/requirements.txt
+++ b/backend/python/coqui/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.76.0
+grpcio==1.78.1
 protobuf
 certifi
 packaging==24.1
--- a/backend/python/faster-qwen3-tts/Makefile
+++ b/backend/python/faster-qwen3-tts/Makefile
@@ -0,0 +1,23 @@
+.PHONY: faster-qwen3-tts
+faster-qwen3-tts:
+	bash install.sh
+
+.PHONY: run
+run: faster-qwen3-tts
+	@echo "Running faster-qwen3-tts..."
+	bash run.sh
+	@echo "faster-qwen3-tts run."
+
+.PHONY: test
+test: faster-qwen3-tts
+	@echo "Testing faster-qwen3-tts..."
+	bash test.sh
+	@echo "faster-qwen3-tts tested."
+
+.PHONY: protogen-clean
+protogen-clean:
+	$(RM) backend_pb2_grpc.py backend_pb2.py
+
+.PHONY: clean
+clean: protogen-clean
+	rm -rf venv __pycache__
--- a/backend/python/faster-qwen3-tts/backend.py
+++ b/backend/python/faster-qwen3-tts/backend.py
@@ -0,0 +1,193 @@
+#!/usr/bin/env python3
+"""
+gRPC server of LocalAI for Faster Qwen3-TTS (CUDA graph capture, voice clone only).
+"""
+from concurrent import futures
+import time
+import argparse
+import signal
+import sys
+import os
+import traceback
+import backend_pb2
+import backend_pb2_grpc
+import torch
+import soundfile as sf
+
+import grpc
+
+
+def is_float(s):
+    try:
+        float(s)
+        return True
+    except ValueError:
+        return False
+
+
+def is_int(s):
+    try:
+        int(s)
+        return True
+    except ValueError:
+        return False
+
+
+_ONE_DAY_IN_SECONDS = 60 * 60 * 24
+MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
+
+
+class BackendServicer(backend_pb2_grpc.BackendServicer):
+    def Health(self, request, context):
+        return backend_pb2.Reply(message=bytes("OK", 'utf-8'))
+
+    def LoadModel(self, request, context):
+        if not torch.cuda.is_available():
+            return backend_pb2.Result(
+                success=False,
+                message="faster-qwen3-tts requires NVIDIA GPU with CUDA"
+            )
+
+        self.options = {}
+        for opt in request.Options:
+            if ":" not in opt:
+                continue
+            key, value = opt.split(":", 1)
+            if is_float(value):
+                value = float(value)
+            elif is_int(value):
+                value = int(value)
+            elif value.lower() in ["true", "false"]:
+                value = value.lower() == "true"
+            self.options[key] = value
+
+        model_path = request.Model or "Qwen/Qwen3-TTS-12Hz-0.6B-Base"
+        self.audio_path = request.AudioPath if hasattr(request, 'AudioPath') and request.AudioPath else None
+        self.model_file = request.ModelFile if hasattr(request, 'ModelFile') and request.ModelFile else None
+        self.model_path = request.ModelPath if hasattr(request, 'ModelPath') and request.ModelPath else None
+
+        from faster_qwen3_tts import FasterQwen3TTS
+        print(f"Loading model from: {model_path}", file=sys.stderr)
+        try:
+            self.model = FasterQwen3TTS.from_pretrained(model_path)
+        except Exception as e:
+            print(f"[ERROR] Loading model: {type(e).__name__}: {e}", file=sys.stderr)
+            print(traceback.format_exc(), file=sys.stderr)
+            return backend_pb2.Result(success=False, message=str(e))
+
+        print(f"Model loaded successfully: {model_path}", file=sys.stderr)
+        return backend_pb2.Result(message="Model loaded successfully", success=True)
+
+    def _get_ref_audio_path(self, request):
+        if not self.audio_path:
+            return None
+        if os.path.isabs(self.audio_path):
+            return self.audio_path
+        if self.model_file:
+            model_file_base = os.path.dirname(self.model_file)
+            ref_path = os.path.join(model_file_base, self.audio_path)
+            if os.path.exists(ref_path):
+                return ref_path
+        if self.model_path:
+            ref_path = os.path.join(self.model_path, self.audio_path)
+            if os.path.exists(ref_path):
+                return ref_path
+        return self.audio_path
+
+    def TTS(self, request, context):
+        try:
+            if not request.dst:
+                return backend_pb2.Result(
+                    success=False,
+                    message="dst (output path) is required"
+                )
+            text = request.text.strip()
+            if not text:
+                return backend_pb2.Result(
+                    success=False,
+                    message="Text is empty"
+                )
+
+            language = request.language if hasattr(request, 'language') and request.language else None
+            if not language or language == "":
+                language = "English"
+
+            ref_audio = self._get_ref_audio_path(request)
+            if not ref_audio:
+                return backend_pb2.Result(
+                    success=False,
+                    message="AudioPath is required for voice clone (set in LoadModel)"
+                )
+            ref_text = self.options.get("ref_text")
+            if not ref_text and hasattr(request, 'ref_text') and request.ref_text:
+                ref_text = request.ref_text
+            if not ref_text:
+                return backend_pb2.Result(
+                    success=False,
+                    message="ref_text is required for voice clone (set via LoadModel Options, e.g. ref_text:Your reference transcript)"
+                )
+
+            chunk_size = self.options.get("chunk_size")
+            generation_kwargs = {}
+            if chunk_size is not None:
+                generation_kwargs["chunk_size"] = int(chunk_size)
+
+            audio_list, sr = self.model.generate_voice_clone(
+                text=text,
+                language=language,
+                ref_audio=ref_audio,
+                ref_text=ref_text,
+                **generation_kwargs
+            )
+
+            if audio_list is None or (isinstance(audio_list, list) and len(audio_list) == 0):
+                return backend_pb2.Result(
+                    success=False,
+                    message="No audio output generated"
+                )
+            audio_data = audio_list[0] if isinstance(audio_list, list) else audio_list
+            sf.write(request.dst, audio_data, sr)
+            print(f"Saved output to {request.dst}", file=sys.stderr)
+
+        except Exception as err:
+            print(f"Error in TTS: {err}", file=sys.stderr)
+            print(traceback.format_exc(), file=sys.stderr)
+            return backend_pb2.Result(success=False, message=f"Unexpected {err=}, {type(err)=}")
+
+        return backend_pb2.Result(success=True)
+
+
+def serve(address):
+    server = grpc.server(
+        futures.ThreadPoolExecutor(max_workers=MAX_WORKERS),
+        options=[
+            ('grpc.max_message_length', 50 * 1024 * 1024),
+            ('grpc.max_send_message_length', 50 * 1024 * 1024),
+            ('grpc.max_receive_message_length', 50 * 1024 * 1024),
+        ]
+    )
+    backend_pb2_grpc.add_BackendServicer_to_server(BackendServicer(), server)
+    server.add_insecure_port(address)
+    server.start()
+    print("Server started. Listening on: " + address, file=sys.stderr)
+
+    def signal_handler(sig, frame):
+        print("Received termination signal. Shutting down...")
+        server.stop(0)
+        sys.exit(0)
+
+    signal.signal(signal.SIGINT, signal_handler)
+    signal.signal(signal.SIGTERM, signal_handler)
+
+    try:
+        while True:
+            time.sleep(_ONE_DAY_IN_SECONDS)
+    except KeyboardInterrupt:
+        server.stop(0)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Run the gRPC server.")
+    parser.add_argument("--addr", default="localhost:50051", help="The address to bind the server to.")
+    args = parser.parse_args()
+    serve(args.addr)
--- a/backend/python/faster-qwen3-tts/install.sh
+++ b/backend/python/faster-qwen3-tts/install.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+set -e
+
+EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+installRequirements
--- a/backend/python/faster-qwen3-tts/requirements-cublas12.txt
+++ b/backend/python/faster-qwen3-tts/requirements-cublas12.txt
@@ -0,0 +1,4 @@
+--extra-index-url https://download.pytorch.org/whl/cu121
+torch
+torchaudio
+faster-qwen3-tts
--- a/backend/python/faster-qwen3-tts/requirements-cublas13.txt
+++ b/backend/python/faster-qwen3-tts/requirements-cublas13.txt
@@ -0,0 +1,4 @@
+--extra-index-url https://download.pytorch.org/whl/cu130
+torch
+torchaudio
+faster-qwen3-tts
--- a/backend/python/faster-qwen3-tts/requirements-l4t12.txt
+++ b/backend/python/faster-qwen3-tts/requirements-l4t12.txt
@@ -0,0 +1,4 @@
+--extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129/
+torch
+torchaudio
+faster-qwen3-tts
--- a/backend/python/faster-qwen3-tts/requirements-l4t13.txt
+++ b/backend/python/faster-qwen3-tts/requirements-l4t13.txt
@@ -0,0 +1,4 @@
+--extra-index-url https://download.pytorch.org/whl/cu130
+torch
+torchaudio
+faster-qwen3-tts
--- a/backend/python/faster-qwen3-tts/requirements.txt
+++ b/backend/python/faster-qwen3-tts/requirements.txt
@@ -0,0 +1,8 @@
+grpcio==1.71.0
+protobuf
+certifi
+packaging==24.1
+soundfile
+setuptools
+six
+sox
--- a/backend/python/faster-qwen3-tts/run.sh
+++ b/backend/python/faster-qwen3-tts/run.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+startBackend $@
--- a/backend/python/faster-qwen3-tts/test.py
+++ b/backend/python/faster-qwen3-tts/test.py
@@ -0,0 +1,104 @@
+"""
+Tests for the faster-qwen3-tts gRPC backend.
+"""
+import unittest
+import subprocess
+import time
+import os
+import sys
+import tempfile
+import backend_pb2
+import backend_pb2_grpc
+import grpc
+
+
+class TestBackendServicer(unittest.TestCase):
+    def setUp(self):
+        self.service = subprocess.Popen(
+            ["python3", "backend.py", "--addr", "localhost:50052"],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True,
+            cwd=os.path.dirname(os.path.abspath(__file__)),
+        )
+        time.sleep(15)
+
+    def tearDown(self):
+        self.service.terminate()
+        try:
+            self.service.communicate(timeout=5)
+        except subprocess.TimeoutExpired:
+            self.service.kill()
+            self.service.communicate()
+
+    def test_health(self):
+        with grpc.insecure_channel("localhost:50052") as channel:
+            stub = backend_pb2_grpc.BackendStub(channel)
+            reply = stub.Health(backend_pb2.HealthMessage(), timeout=5.0)
+        self.assertEqual(reply.message, b"OK")
+
+    def test_load_model_requires_cuda(self):
+        with grpc.insecure_channel("localhost:50052") as channel:
+            stub = backend_pb2_grpc.BackendStub(channel)
+            response = stub.LoadModel(
+                backend_pb2.ModelOptions(
+                    Model="Qwen/Qwen3-TTS-12Hz-0.6B-Base",
+                    CUDA=True,
+                ),
+                timeout=10.0,
+            )
+        self.assertFalse(response.success)
+
+    @unittest.skipUnless(
+        __import__("torch").cuda.is_available(),
+        "faster-qwen3-tts TTS requires CUDA",
+    )
+    def test_tts(self):
+        import soundfile as sf
+        try:
+            with grpc.insecure_channel("localhost:50052") as channel:
+                stub = backend_pb2_grpc.BackendStub(channel)
+                ref_audio = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
+                ref_audio.close()
+                try:
+                    sr = 22050
+                    duration = 1.0
+                    samples = int(sr * duration)
+                    sf.write(ref_audio.name, [0.0] * samples, sr)
+
+                    response = stub.LoadModel(
+                        backend_pb2.ModelOptions(
+                            Model="Qwen/Qwen3-TTS-12Hz-0.6B-Base",
+                            AudioPath=ref_audio.name,
+                            Options=["ref_text:Hello world"],
+                        ),
+                        timeout=600.0,
+                    )
+                    self.assertTrue(response.success, response.message)
+
+                    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as out:
+                        output_path = out.name
+                    try:
+                        tts_response = stub.TTS(
+                            backend_pb2.TTSRequest(
+                                text="Test output.",
+                                dst=output_path,
+                                language="English",
+                            ),
+                            timeout=120.0,
+                        )
+                        self.assertTrue(tts_response.success, tts_response.message)
+                        self.assertTrue(os.path.exists(output_path))
+                        self.assertGreater(os.path.getsize(output_path), 0)
+                    finally:
+                        if os.path.exists(output_path):
+                            os.unlink(output_path)
+                finally:
+                    if os.path.exists(ref_audio.name):
+                        os.unlink(ref_audio.name)
+        except Exception as err:
+            self.fail(f"TTS test failed: {err}")
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/backend/python/faster-qwen3-tts/test.sh
+++ b/backend/python/faster-qwen3-tts/test.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests
--- a/backend/python/rerankers/requirements.txt
+++ b/backend/python/rerankers/requirements.txt
@@ -1,3 +1,3 @@
-grpcio==1.76.0
+grpcio==1.78.1
 protobuf
 certifi
--- a/backend/python/transformers/requirements-cpu.txt
+++ b/backend/python/transformers/requirements-cpu.txt
@@ -4,5 +4,5 @@ numba==0.60.0
 accelerate
 transformers
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-cublas12.txt
+++ b/backend/python/transformers/requirements-cublas12.txt
@@ -4,5 +4,5 @@ llvmlite==0.43.0
 numba==0.60.0
 transformers
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-cublas13.txt
+++ b/backend/python/transformers/requirements-cublas13.txt
@@ -4,5 +4,5 @@ llvmlite==0.43.0
 numba==0.60.0
 transformers
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-hipblas.txt
+++ b/backend/python/transformers/requirements-hipblas.txt
@@ -5,5 +5,5 @@ transformers
 llvmlite==0.43.0
 numba==0.60.0
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-intel.txt
+++ b/backend/python/transformers/requirements-intel.txt
@@ -5,5 +5,5 @@ llvmlite==0.43.0
 numba==0.60.0
 transformers
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements-mps.txt
+++ b/backend/python/transformers/requirements-mps.txt
@@ -4,5 +4,5 @@ numba==0.60.0
 accelerate
 transformers
 bitsandbytes
-sentence-transformers==5.2.2
+sentence-transformers==5.2.3
 protobuf==6.33.5
--- a/backend/python/transformers/requirements.txt
+++ b/backend/python/transformers/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.76.0
+grpcio==1.78.1
 protobuf==6.33.5
 certifi
 setuptools
--- a/backend/python/vllm/requirements.txt
+++ b/backend/python/vllm/requirements.txt
@@ -1,4 +1,4 @@
-grpcio==1.76.0
+grpcio==1.78.1
 protobuf
 certifi
 setuptools
--- a/core/cli/run.go
+++ b/core/cli/run.go
@@ -71,6 +71,7 @@ type RunCMD struct {
 	WatchdogIdleTimeout                string   `env:"LOCALAI_WATCHDOG_IDLE_TIMEOUT,WATCHDOG_IDLE_TIMEOUT" default:"15m" help:"Threshold beyond which an idle backend should be stopped" group:"backends"`
 	EnableWatchdogBusy                 bool     `env:"LOCALAI_WATCHDOG_BUSY,WATCHDOG_BUSY" default:"false" help:"Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout" group:"backends"`
 	WatchdogBusyTimeout                string   `env:"LOCALAI_WATCHDOG_BUSY_TIMEOUT,WATCHDOG_BUSY_TIMEOUT" default:"5m" help:"Threshold beyond which a busy backend should be stopped" group:"backends"`
+	WatchdogInterval                   string   `env:"LOCALAI_WATCHDOG_INTERVAL,WATCHDOG_INTERVAL" default:"500ms" help:"Interval between watchdog checks (e.g., 500ms, 5s, 1m) (default: 500ms)" group:"backends"`
 	EnableMemoryReclaimer              bool     `env:"LOCALAI_MEMORY_RECLAIMER,MEMORY_RECLAIMER,LOCALAI_GPU_RECLAIMER,GPU_RECLAIMER" default:"false" help:"Enable memory threshold monitoring to auto-evict backends when memory usage exceeds threshold (uses GPU VRAM if available, otherwise RAM)" group:"backends"`
 	MemoryReclaimerThreshold           float64  `env:"LOCALAI_MEMORY_RECLAIMER_THRESHOLD,MEMORY_RECLAIMER_THRESHOLD,LOCALAI_GPU_RECLAIMER_THRESHOLD,GPU_RECLAIMER_THRESHOLD" default:"0.95" help:"Memory usage threshold (0.0-1.0) that triggers backend eviction (default 0.95 = 95%%)" group:"backends"`
 	ForceEvictionWhenBusy              bool     `env:"LOCALAI_FORCE_EVICTION_WHEN_BUSY,FORCE_EVICTION_WHEN_BUSY" default:"false" help:"Force eviction even when models have active API calls (default: false for safety)" group:"backends"`
@@ -215,6 +216,13 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
 			}
 			opts = append(opts, config.SetWatchDogBusyTimeout(dur))
 		}
+		if r.WatchdogInterval != "" {
+			dur, err := time.ParseDuration(r.WatchdogInterval)
+			if err != nil {
+				return err
+			}
+			opts = append(opts, config.SetWatchDogInterval(dur))
+		}
 	}

 	// Handle memory reclaimer (uses GPU VRAM if available, otherwise RAM)
--- a/core/cli/transcript.go
+++ b/core/cli/transcript.go
@@ -31,8 +31,8 @@ type TranscriptCMD struct {
 	ModelsPath       string                                 `env:"LOCALAI_MODELS_PATH,MODELS_PATH" type:"path" default:"${basepath}/models" help:"Path containing models used for inferencing" group:"storage"`
 	BackendGalleries string                                 `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
 	Prompt           string                                 `short:"p" help:"Previous transcribed text or words that hint at what the model should expect"`
-	ResponseFormat   schema.TranscriptionResponseFormatType `short:"f" default:"" help:"Response format for Whisper models, can be one of (txt, lrc, srt, vtt, json, json_verbose)"`
-	PrettyPrint      bool                                   `help:"Used with response_format json or json_verbose for pretty printing"`
+	ResponseFormat   schema.TranscriptionResponseFormatType `short:"f" default:"" help:"Response format for Whisper models, can be one of (txt, lrc, srt, vtt, json, verbose_json)"`
+	PrettyPrint      bool                                   `help:"Used with response_format json or verbose_json for pretty printing"`
 }

 func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
--- a/core/config/application_config.go
+++ b/core/config/application_config.go
@@ -98,10 +98,11 @@ func NewApplicationConfig(o ...AppOption) *ApplicationConfig {
 		Context:                  context.Background(),
 		UploadLimitMB:            15,
 		Debug:                    true,
-		AgentJobRetentionDays:    30,              // Default: 30 days
-		LRUEvictionMaxRetries:    30,              // Default: 30 retries
-		LRUEvictionRetryInterval: 1 * time.Second, // Default: 1 second
-		TracingMaxItems:       1024,
+		AgentJobRetentionDays:    30,                     // Default: 30 days
+		LRUEvictionMaxRetries:    30,                     // Default: 30 retries
+		LRUEvictionRetryInterval: 1 * time.Second,        // Default: 1 second
+		WatchDogInterval:         500 * time.Millisecond, // Default: 500ms
+		TracingMaxItems:          1024,
 		PathWithoutAuth: []string{
 			"/static/",
 			"/generated-audio/",
@@ -208,6 +209,12 @@ func SetWatchDogIdleTimeout(t time.Duration) AppOption {
 	}
 }

+func SetWatchDogInterval(t time.Duration) AppOption {
+	return func(o *ApplicationConfig) {
+		o.WatchDogInterval = t
+	}
+}
+
 // EnableMemoryReclaimer enables memory threshold monitoring.
 // When enabled, the watchdog will evict backends if memory usage exceeds the threshold.
 // Works with GPU VRAM if available, otherwise uses system RAM.
@@ -642,7 +649,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
 		AutoloadBackendGalleries: &autoloadBackendGalleries,
 		ApiKeys:                  &apiKeys,
 		AgentJobRetentionDays:    &agentJobRetentionDays,
-		OpenResponsesStoreTTL:     &openResponsesStoreTTL,
+		OpenResponsesStoreTTL:    &openResponsesStoreTTL,
 	}
 }

--- a/core/gallery/models.go
+++ b/core/gallery/models.go
@@ -215,7 +215,7 @@ func InstallModel(ctx context.Context, systemState *system.SystemState, nameOver
 			return nil, fmt.Errorf("failed to create parent directory for prompt template %q: %v", template.Name, err)
 		}
 		// Create and write file content
-		err = os.WriteFile(filePath, []byte(template.Content), 0600)
+		err = os.WriteFile(filePath, []byte(template.Content), 0644)
 		if err != nil {
 			return nil, fmt.Errorf("failed to write prompt template %q: %v", template.Name, err)
 		}
@@ -268,7 +268,7 @@ func InstallModel(ctx context.Context, systemState *system.SystemState, nameOver
 			return nil, fmt.Errorf("failed to validate updated config YAML: %v", err)
 		}

-		err = os.WriteFile(configFilePath, updatedConfigYAML, 0600)
+		err = os.WriteFile(configFilePath, updatedConfigYAML, 0644)
 		if err != nil {
 			return nil, fmt.Errorf("failed to write updated config file: %v", err)
 		}
@@ -285,7 +285,7 @@ func InstallModel(ctx context.Context, systemState *system.SystemState, nameOver

 	xlog.Debug("Written gallery file", "file", modelFile)

-	return &modelConfig, os.WriteFile(modelFile, data, 0600)
+	return &modelConfig, os.WriteFile(modelFile, data, 0644)
 }

 func galleryFileName(name string) string {
--- a/core/http/app.go
+++ b/core/http/app.go
@@ -29,6 +29,8 @@ import (
 //go:embed static/*
 var embedDirStatic embed.FS

+var quietPaths = []string{"/api/operations", "/api/resources", "/healthz", "/readyz"}
+
 // @title LocalAI API
 // @version 2.0.0
 // @description The LocalAI Rest API.
@@ -109,10 +111,17 @@ func API(application *application.Application) (*echo.Echo, error) {
 			res := c.Response()
 			err := next(c)

-			// Fix for #7989: Reduce log verbosity of Web UI polling and resources API
-			// If the path is /api/operations or /api/resources and the request was successful (200),
-			// we log it at DEBUG level (hidden by default) instead of INFO.
-			if (req.URL.Path == "/api/operations" || req.URL.Path == "/api/resources") && res.Status == 200 {
+			// Fix for #7989: Reduce log verbosity of Web UI polling, resources API, and health checks
+			// These paths are logged at DEBUG level (hidden by default) instead of INFO.
+			isQuietPath := false
+			for _, path := range quietPaths {
+				if req.URL.Path == path {
+					isQuietPath = true
+					break
+				}
+			}
+
+			if isQuietPath && res.Status == 200 {
 				xlog.Debug("HTTP request", "method", req.Method, "path", req.URL.Path, "status", res.Status)
 			} else {
 				xlog.Info("HTTP request", "method", req.Method, "path", req.URL.Path, "status", res.Status)
--- a/core/http/endpoints/localai/edit_model.go
+++ b/core/http/endpoints/localai/edit_model.go
@@ -10,6 +10,7 @@ import (
 	"github.com/mudler/LocalAI/core/config"
 	httpUtils "github.com/mudler/LocalAI/core/http/middleware"
 	"github.com/mudler/LocalAI/internal"
+	"github.com/mudler/LocalAI/pkg/model"
 	"github.com/mudler/LocalAI/pkg/utils"

 	"gopkg.in/yaml.v3"
@@ -78,7 +79,7 @@ func GetEditModelPage(cl *config.ModelConfigLoader, appConfig *config.Applicatio
 }

 // EditModelEndpoint handles updating existing model configurations
-func EditModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
+func EditModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
 	return func(c echo.Context) error {
 		modelName := c.Param("name")
 		if modelName == "" {
@@ -174,6 +175,14 @@ func EditModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.Applicati
 			return c.JSON(http.StatusInternalServerError, response)
 		}

+		// Shutdown the running model to apply new configuration (e.g., context_size)
+		// The model will be reloaded on the next inference request
+		if err := ml.ShutdownModel(modelName); err != nil {
+			// Log the error but don't fail the request - the config was saved successfully
+			// The model can still be manually reloaded or restarted
+			fmt.Printf("Warning: Failed to shutdown model '%s': %v\n", modelName, err)
+		}
+
 		// Preload the model
 		if err := cl.Preload(appConfig.SystemState.Model.ModelsPath); err != nil {
 			response := ModelResponse{
@@ -186,7 +195,7 @@ func EditModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.Applicati
 		// Return success response
 		response := ModelResponse{
 			Success:  true,
-			Message:  fmt.Sprintf("Model '%s' updated successfully", modelName),
+			Message:  fmt.Sprintf("Model '%s' updated successfully. Model has been reloaded with new configuration.", modelName),
 			Filename: configPath,
 			Config:   req,
 		}
--- a/core/http/endpoints/localai/tts.go
+++ b/core/http/endpoints/localai/tts.go
@@ -79,6 +79,14 @@ func TTSEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig
 			return err
 		}

+		// Resample to requested sample rate if specified
+		if input.SampleRate > 0 {
+			filePath, err = utils.AudioResample(filePath, input.SampleRate)
+			if err != nil {
+				return err
+			}
+		}
+
 		// Convert generated file to target format
 		filePath, err = utils.AudioConvert(filePath, input.Format)
 		if err != nil {
--- a/core/http/endpoints/openai/realtime.go
+++ b/core/http/endpoints/openai/realtime.go
@@ -1046,6 +1046,27 @@ func triggerResponse(session *Session, conv *Conversation, c *LockedWebsocket, o
 					Content:       content.Text,
 				})
 			}
+		} else if item.FunctionCall != nil {
+			conversationHistory = append(conversationHistory, schema.Message{
+				Role: string(types.MessageRoleAssistant),
+				ToolCalls: []schema.ToolCall{
+					{
+						ID:   item.FunctionCall.CallID,
+						Type: "function",
+						FunctionCall: schema.FunctionCall{
+							Name:      item.FunctionCall.Name,
+							Arguments: item.FunctionCall.Arguments,
+						},
+					},
+				},
+			})
+		} else if item.FunctionCallOutput != nil {
+			conversationHistory = append(conversationHistory, schema.Message{
+				Role:          "tool",
+				Name:          item.FunctionCallOutput.CallID,
+				Content:       item.FunctionCallOutput.Output,
+				StringContent: item.FunctionCallOutput.Output,
+			})
 		}
 	}
 	conv.Lock.Unlock()
--- a/core/http/routes/localai.go
+++ b/core/http/routes/localai.go
@@ -66,7 +66,7 @@ func RegisterLocalAIRoutes(router *echo.Echo,
 		router.POST("/models/import-uri", localai.ImportModelURIEndpoint(cl, appConfig, galleryService, opcache))

 		// Custom model edit endpoint
-		router.POST("/models/edit/:name", localai.EditModelEndpoint(cl, appConfig))
+		router.POST("/models/edit/:name", localai.EditModelEndpoint(cl, ml, appConfig))

 		// Reload models endpoint
 		router.POST("/models/reload", localai.ReloadModelsEndpoint(cl, appConfig))
--- a/core/http/static/chat.js
+++ b/core/http/static/chat.js
@@ -1148,6 +1148,9 @@ async function promptGPT(systemPrompt, input) {

  messages = chatStore.messages();

+  // Exclude thinking/reasoning from API payload (backend chat templates expect only system/user/assistant)
+  messages = messages.filter((m) => m.role !== "thinking" && m.role !== "reasoning");
+
  // if systemPrompt isn't empty, push it at the start of messages
  if (systemPrompt) {
    messages.unshift({
@@ -2530,12 +2533,14 @@ document.addEventListener("alpine:init", () => {
      messages() {
        const chat = this.activeChat();
        if (!chat) return [];
-        return chat.history.map((message) => ({
-          role: message.role,
-          content: message.content,
-          image: message.image,
-          audio: message.audio,
-        }));
+        return chat.history
+          .filter((message) => message.role !== "thinking" && message.role !== "reasoning")
+          .map((message) => ({
+            role: message.role,
+            content: message.content,
+            image: message.image,
+            audio: message.audio,
+          }));
      },
      
      // Getter for active chat history to ensure reactivity
--- a/core/http/views/partials/navbar.html
+++ b/core/http/views/partials/navbar.html
@@ -85,7 +85,7 @@
        <span class="nav-label">Swarm</span>
      </a>
      <a href="/manage" class="nav-item">
-        <i class="fas fa-server nav-icon"></i>
+        <i class="fas fa-desktop nav-icon"></i>
        <span class="nav-label">System</span>
      </a>
      {{ if not .DisableRuntimeSettings }}
--- a/core/http/views/traces.html
+++ b/core/http/views/traces.html
@@ -135,26 +135,44 @@
                <table class="w-full border-collapse">
                    <thead>
                        <tr class="border-b border-[var(--color-bg-secondary)]">
+                            <th class="w-8 p-2"></th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Method</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Path</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Status</th>
-                            <th class="text-right p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Actions</th>
                        </tr>
                    </thead>
-                    <tbody>
-                        <template x-for="(trace, index) in traces" :key="index">
-                            <tr class="hover:bg-[var(--color-bg-secondary)]/50 border-b border-[var(--color-bg-secondary)] transition-colors">
+                    <template x-for="(trace, index) in traces" :key="index">
+                        <tbody>
+                            <tr @click="toggleTrace(index)"
+                                class="cursor-pointer hover:bg-[var(--color-bg-secondary)]/50 border-b border-[var(--color-bg-secondary)] transition-colors">
+                                <td class="p-2 w-8 text-center">
+                                    <i class="fas fa-chevron-right text-xs text-[var(--color-text-secondary)] transition-transform duration-200"
+                                       :class="expandedTraces[index] ? 'rotate-90' : ''"></i>
+                                </td>
                                <td class="p-2" x-text="trace.request.method"></td>
                                <td class="p-2" x-text="trace.request.path"></td>
                                <td class="p-2" x-text="trace.response.status"></td>
-                                <td class="p-2 text-right">
-                                    <button @click="showDetails(index)" class="text-[var(--color-primary)]/60 hover:text-[var(--color-primary)] hover:bg-[var(--color-primary)]/10 rounded p-1 transition-colors">
-                                        <i class="fas fa-eye text-xs"></i>
-                                    </button>
+                            </tr>
+                            <tr x-show="expandedTraces[index]">
+                                <td colspan="4" class="p-0">
+                                    <div class="p-4 bg-[var(--color-bg-secondary)]/30 border-b border-[var(--color-bg-secondary)]">
+                                        <div class="grid grid-cols-1 lg:grid-cols-2 gap-4">
+                                            <div>
+                                                <h4 class="text-sm font-semibold text-[var(--color-text-primary)] mb-2">Request Body</h4>
+                                                <pre class="overflow-auto max-h-[70vh] p-3 rounded-lg bg-[var(--color-bg-primary)] border border-[var(--color-border-subtle)] text-xs font-mono text-[var(--color-text-secondary)] whitespace-pre-wrap break-words"
+                                                     x-text="formatTraceBody(trace.request.body)"></pre>
+                                            </div>
+                                            <div>
+                                                <h4 class="text-sm font-semibold text-[var(--color-text-primary)] mb-2">Response Body</h4>
+                                                <pre class="overflow-auto max-h-[70vh] p-3 rounded-lg bg-[var(--color-bg-primary)] border border-[var(--color-border-subtle)] text-xs font-mono text-[var(--color-text-secondary)] whitespace-pre-wrap break-words"
+                                                     x-text="formatTraceBody(trace.response.body)"></pre>
+                                            </div>
+                                        </div>
+                                    </div>
                                </td>
                            </tr>
-                        </template>
-                    </tbody>
+                        </tbody>
+                    </template>
                </table>
                <div x-show="traces.length === 0" class="text-center py-8 text-[var(--color-text-secondary)] text-sm">
                    No API traces recorded yet.
@@ -168,18 +186,23 @@
                <table class="w-full border-collapse">
                    <thead>
                        <tr class="border-b border-[var(--color-bg-secondary)]">
+                            <th class="w-8 p-2"></th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Type</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Timestamp</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Model</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Summary</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Duration</th>
                            <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Status</th>
-                            <th class="text-right p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Actions</th>
                        </tr>
                    </thead>
-                    <tbody>
-                        <template x-for="(trace, index) in backendTraces" :key="index">
-                            <tr class="hover:bg-[var(--color-bg-secondary)]/50 border-b border-[var(--color-bg-secondary)] transition-colors">
+                    <template x-for="(trace, index) in backendTraces" :key="index">
+                        <tbody>
+                            <tr @click="toggleBackendTrace(index)"
+                                class="cursor-pointer hover:bg-[var(--color-bg-secondary)]/50 border-b border-[var(--color-bg-secondary)] transition-colors">
+                                <td class="p-2 w-8 text-center">
+                                    <i class="fas fa-chevron-right text-xs text-[var(--color-text-secondary)] transition-transform duration-200"
+                                       :class="expandedBackendTraces[index] ? 'rotate-90' : ''"></i>
+                                </td>
                                <td class="p-2">
                                    <span class="inline-flex items-center px-2 py-0.5 rounded text-xs font-medium"
                                          :class="getTypeClass(trace.type)"
@@ -197,14 +220,82 @@
                                        <i class="fas fa-times-circle text-red-500 text-xs" :title="trace.error"></i>
                                    </template>
                                </td>
-                                <td class="p-2 text-right">
-                                    <button @click="showBackendDetails(index)" class="text-[var(--color-primary)]/60 hover:text-[var(--color-primary)] hover:bg-[var(--color-primary)]/10 rounded p-1 transition-colors">
-                                        <i class="fas fa-eye text-xs"></i>
-                                    </button>
+                            </tr>
+                            <tr x-show="expandedBackendTraces[index]">
+                                <td colspan="7" class="p-0">
+                                    <div class="p-4 bg-[var(--color-bg-secondary)]/30 border-b border-[var(--color-bg-secondary)]">
+                                        <!-- Header info -->
+                                        <div class="grid grid-cols-2 md:grid-cols-4 gap-3 mb-4">
+                                            <div class="bg-[var(--color-bg-primary)] rounded-lg p-3 border border-[var(--color-border-subtle)]">
+                                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Type</div>
+                                                <span class="inline-flex items-center px-2 py-0.5 rounded text-xs font-medium"
+                                                      :class="getTypeClass(trace.type)"
+                                                      x-text="trace.type"></span>
+                                            </div>
+                                            <div class="bg-[var(--color-bg-primary)] rounded-lg p-3 border border-[var(--color-border-subtle)]">
+                                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Model</div>
+                                                <div class="text-sm font-medium" x-text="trace.model_name || '-'"></div>
+                                            </div>
+                                            <div class="bg-[var(--color-bg-primary)] rounded-lg p-3 border border-[var(--color-border-subtle)]">
+                                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Backend</div>
+                                                <div class="text-sm font-medium" x-text="trace.backend || '-'"></div>
+                                            </div>
+                                            <div class="bg-[var(--color-bg-primary)] rounded-lg p-3 border border-[var(--color-border-subtle)]">
+                                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Duration</div>
+                                                <div class="text-sm font-medium" x-text="formatDuration(trace.duration)"></div>
+                                            </div>
+                                        </div>
+
+                                        <!-- Error banner -->
+                                        <div x-show="trace.error" class="bg-red-500/10 border border-red-500/30 rounded-lg p-3 mb-4">
+                                            <div class="flex items-center gap-2">
+                                                <i class="fas fa-exclamation-triangle text-red-500 text-sm"></i>
+                                                <span class="text-sm text-red-400" x-text="trace.error"></span>
+                                            </div>
+                                        </div>
+
+                                        <!-- Data fields as nested accordions -->
+                                        <template x-if="trace.data && Object.keys(trace.data).length > 0">
+                                            <div>
+                                                <h4 class="text-sm font-semibold text-[var(--color-text-primary)] mb-2">Data Fields</h4>
+                                                <div class="border border-[var(--color-border-subtle)] rounded-lg overflow-hidden">
+                                                    <template x-for="[key, value] in Object.entries(trace.data)" :key="key">
+                                                        <div class="border-b border-[var(--color-border-subtle)] last:border-b-0">
+                                                            <!-- Field header row -->
+                                                            <div @click="isLargeValue(value) && toggleBackendField(index, key)"
+                                                                 class="flex items-center gap-2 px-3 py-2 hover:bg-[var(--color-bg-primary)]/50 transition-colors"
+                                                                 :class="isLargeValue(value) ? 'cursor-pointer' : ''">
+                                                                <template x-if="isLargeValue(value)">
+                                                                    <i class="fas fa-chevron-right text-[10px] text-[var(--color-text-secondary)] transition-transform duration-200 w-3 flex-shrink-0"
+                                                                       :class="isBackendFieldExpanded(index, key) ? 'rotate-90' : ''"></i>
+                                                                </template>
+                                                                <template x-if="!isLargeValue(value)">
+                                                                    <span class="w-3 flex-shrink-0"></span>
+                                                                </template>
+                                                                <span class="text-sm font-mono text-[var(--color-primary)] flex-shrink-0" x-text="key"></span>
+                                                                <template x-if="!isLargeValue(value)">
+                                                                    <span class="font-mono text-xs text-[var(--color-text-secondary)]" x-text="formatValue(value)"></span>
+                                                                </template>
+                                                                <template x-if="isLargeValue(value) && !isBackendFieldExpanded(index, key)">
+                                                                    <span class="text-xs text-[var(--color-text-secondary)] truncate" x-text="truncateValue(value, 120)"></span>
+                                                                </template>
+                                                            </div>
+                                                            <!-- Expanded field value -->
+                                                            <div x-show="isLargeValue(value) && isBackendFieldExpanded(index, key)"
+                                                                 class="px-3 pb-3">
+                                                                <pre class="overflow-auto max-h-[70vh] p-3 rounded-lg bg-[var(--color-bg-primary)] border border-[var(--color-border-subtle)] text-xs font-mono text-[var(--color-text-secondary)] whitespace-pre-wrap break-words"
+                                                                     x-text="formatLargeValue(value)"></pre>
+                                                            </div>
+                                                        </div>
+                                                    </template>
+                                                </div>
+                                            </div>
+                                        </template>
+                                    </div>
                                </td>
                            </tr>
-                        </template>
-                    </tbody>
+                        </tbody>
+                    </template>
                </table>
                <div x-show="backendTraces.length === 0" class="text-center py-8 text-[var(--color-text-secondary)] text-sm">
                    No backend traces recorded yet.
@@ -212,149 +303,20 @@
            </div>
        </div>

-        <!-- API Trace Details Modal -->
-        <div x-show="selectedTrace !== null" class="fixed inset-0 bg-black/50 flex items-center justify-center z-50" @click="selectedTrace = null">
-            <div class="bg-[var(--color-bg-secondary)] rounded-lg p-6 max-w-4xl w-full max-h-[90vh] overflow-auto" @click.stop>
-                <div class="flex justify-between mb-4">
-                    <h2 class="h3">API Trace Details</h2>
-                    <button @click="selectedTrace = null" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)]">
-                        <i class="fas fa-times"></i>
-                    </button>
-                </div>
-                <div class="grid grid-cols-2 gap-4">
-                    <div>
-                        <h3 class="text-lg font-semibold mb-2">Request Body</h3>
-                        <div id="requestEditor" class="h-96 border border-[var(--color-primary-border)]/20"></div>
-                    </div>
-                    <div>
-                        <h3 class="text-lg font-semibold mb-2">Response Body</h3>
-                        <div id="responseEditor" class="h-96 border border-[var(--color-primary-border)]/20"></div>
-                    </div>
-                </div>
-            </div>
-        </div>
-
-        <!-- Backend Trace Details Modal -->
-        <div x-show="selectedBackendTrace !== null" class="fixed inset-0 bg-black/50 flex items-center justify-center z-50" @click="selectedBackendTrace = null; detailKey = null; detailValue = null;">
-            <div class="bg-[var(--color-bg-secondary)] rounded-lg p-6 max-w-4xl w-full max-h-[90vh] overflow-auto" @click.stop>
-                <template x-if="selectedBackendTrace !== null">
-                    <div>
-                        <div class="flex justify-between mb-4">
-                            <h2 class="h3">Backend Trace Details</h2>
-                            <button @click="selectedBackendTrace = null; detailKey = null; detailValue = null;" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)]">
-                                <i class="fas fa-times"></i>
-                            </button>
-                        </div>
-
-                        <!-- Header info -->
-                        <div class="grid grid-cols-4 gap-4 mb-4">
-                            <div class="bg-[var(--color-bg-primary)] rounded p-3">
-                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Type</div>
-                                <span class="inline-flex items-center px-2 py-0.5 rounded text-xs font-medium"
-                                      :class="getTypeClass(backendTraces[selectedBackendTrace].type)"
-                                      x-text="backendTraces[selectedBackendTrace].type"></span>
-                            </div>
-                            <div class="bg-[var(--color-bg-primary)] rounded p-3">
-                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Model</div>
-                                <div class="text-sm font-medium" x-text="backendTraces[selectedBackendTrace].model_name || '-'"></div>
-                            </div>
-                            <div class="bg-[var(--color-bg-primary)] rounded p-3">
-                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Backend</div>
-                                <div class="text-sm font-medium" x-text="backendTraces[selectedBackendTrace].backend || '-'"></div>
-                            </div>
-                            <div class="bg-[var(--color-bg-primary)] rounded p-3">
-                                <div class="text-xs text-[var(--color-text-secondary)] mb-1">Duration</div>
-                                <div class="text-sm font-medium" x-text="formatDuration(backendTraces[selectedBackendTrace].duration)"></div>
-                            </div>
-                        </div>
-
-                        <!-- Error banner -->
-                        <div x-show="backendTraces[selectedBackendTrace].error" class="bg-red-500/10 border border-red-500/30 rounded-lg p-3 mb-4">
-                            <div class="flex items-center gap-2">
-                                <i class="fas fa-exclamation-triangle text-red-500 text-sm"></i>
-                                <span class="text-sm text-red-400" x-text="backendTraces[selectedBackendTrace].error"></span>
-                            </div>
-                        </div>
-
-                        <!-- Data fields table -->
-                        <div class="overflow-x-auto">
-                            <table class="w-full border-collapse">
-                                <thead>
-                                    <tr class="border-b border-[var(--color-bg-primary)]">
-                                        <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)] w-1/4">Field</th>
-                                        <th class="text-left p-2 text-xs font-semibold text-[var(--color-text-secondary)]">Value</th>
-                                    </tr>
-                                </thead>
-                                <tbody>
-                                    <template x-for="[key, value] in getDataEntries(selectedBackendTrace)" :key="key">
-                                        <tr class="border-b border-[var(--color-bg-primary)] hover:bg-[var(--color-bg-primary)]/50 transition-colors">
-                                            <td class="p-2 text-sm font-mono text-[var(--color-primary)]" x-text="key"></td>
-                                            <td class="p-2 text-sm">
-                                                <template x-if="isLargeValue(value)">
-                                                    <button @click="showValueDetail(key, value)"
-                                                            class="text-left max-w-full">
-                                                        <span class="block truncate max-w-lg text-[var(--color-text-secondary)]" x-text="truncateValue(value, 120)"></span>
-                                                        <span class="text-xs text-[var(--color-primary)] hover:underline mt-0.5 inline-block">View full value</span>
-                                                    </button>
-                                                </template>
-                                                <template x-if="!isLargeValue(value)">
-                                                    <span class="font-mono text-xs" x-text="formatValue(value)"></span>
-                                                </template>
-                                            </td>
-                                        </tr>
-                                    </template>
-                                </tbody>
-                            </table>
-                        </div>
-                    </div>
-                </template>
-            </div>
-        </div>
-
-        <!-- Value Detail Modal -->
-        <div x-show="detailValue !== null" class="fixed inset-0 bg-black/50 flex items-center justify-center z-[60]" @click="detailValue = null; detailKey = null;">
-            <div class="bg-[var(--color-bg-secondary)] rounded-lg p-6 max-w-4xl w-full max-h-[90vh] overflow-auto" @click.stop>
-                <div class="flex justify-between mb-4">
-                    <h2 class="h3 font-mono" x-text="detailKey"></h2>
-                    <button @click="detailValue = null; detailKey = null;" class="text-[var(--color-text-secondary)] hover:text-[var(--color-text-primary)]">
-                        <i class="fas fa-times"></i>
-                    </button>
-                </div>
-                <div id="detailEditor" class="h-[70vh] border border-[var(--color-primary-border)]/20"></div>
-            </div>
-        </div>
-
    </div>


 </div>

-<!-- CodeMirror -->
-<link rel="stylesheet" href="static/assets/codemirror.min.css">
-<script src="static/assets/codemirror.min.js"></script>
-<script src="static/assets/javascript.min.js"></script>
-
-<!-- Styles from model-editor -->
-<style>
-.CodeMirror {
-    height: 100% !important;
-    font-family: monospace;
-}
-</style>
-
 <script>
 function tracesApp() {
    return {
        activeTab: 'api',
        traces: [],
        backendTraces: [],
-        selectedTrace: null,
-        selectedBackendTrace: null,
-        detailKey: null,
-        detailValue: null,
-        requestEditor: null,
-        responseEditor: null,
-        detailEditor: null,
+        expandedTraces: {},
+        expandedBackendTraces: {},
+        expandedBackendFields: {},
        notifications: [],
        settings: {
            enable_tracing: false,
@@ -474,6 +436,7 @@ function tracesApp() {
            if (confirm('Clear all API traces?')) {
                await fetch('/api/traces/clear', { method: 'POST' });
                this.traces = [];
+                this.expandedTraces = {};
            }
        },

@@ -481,101 +444,67 @@ function tracesApp() {
            if (confirm('Clear all backend traces?')) {
                await fetch('/api/backend-traces/clear', { method: 'POST' });
                this.backendTraces = [];
+                this.expandedBackendTraces = {};
+                this.expandedBackendFields = {};
            }
        },

-        showDetails(index) {
-            this.selectedTrace = index;
-            this.$nextTick(() => {
-                const trace = this.traces[index];
-
-                const decodeBase64 = (base64) => {
-                    const binaryString = atob(base64);
-                    const bytes = new Uint8Array(binaryString.length);
-                    for (let i = 0; i < binaryString.length; i++) {
-                        bytes[i] = binaryString.charCodeAt(i);
-                    }
-                    return new TextDecoder().decode(bytes);
-                };
-
-                const formatBody = (bodyText) => {
-                    try {
-                        const json = JSON.parse(bodyText);
-                        return JSON.stringify(json, null, 2);
-                    } catch {
-                        return bodyText;
-                    }
-                };
-
-                const reqBody = formatBody(decodeBase64(trace.request.body));
-                const resBody = formatBody(decodeBase64(trace.response.body));
-
-                if (!this.requestEditor) {
-                    this.requestEditor = CodeMirror(document.getElementById('requestEditor'), {
-                        value: reqBody,
-                        mode: 'javascript',
-                        json: true,
-                        theme: 'default',
-                        lineNumbers: true,
-                        readOnly: true,
-                        lineWrapping: true
-                    });
-                } else {
-                    this.requestEditor.setValue(reqBody);
-                }
-
-                if (!this.responseEditor) {
-                    this.responseEditor = CodeMirror(document.getElementById('responseEditor'), {
-                        value: resBody,
-                        mode: 'javascript',
-                        json: true,
-                        theme: 'default',
-                        lineNumbers: true,
-                        readOnly: true,
-                        lineWrapping: true
-                    });
-                } else {
-                    this.responseEditor.setValue(resBody);
-                }
-            });
+        toggleTrace(index) {
+            this.expandedTraces = {
+                ...this.expandedTraces,
+                [index]: !this.expandedTraces[index]
+            };
        },

-        showBackendDetails(index) {
-            this.selectedBackendTrace = index;
+        toggleBackendTrace(index) {
+            this.expandedBackendTraces = {
+                ...this.expandedBackendTraces,
+                [index]: !this.expandedBackendTraces[index]
+            };
        },

-        showValueDetail(key, value) {
-            this.detailKey = key;
-            let formatted = '';
+        toggleBackendField(index, key) {
+            const fieldKey = index + '-' + key;
+            this.expandedBackendFields = {
+                ...this.expandedBackendFields,
+                [fieldKey]: !this.expandedBackendFields[fieldKey]
+            };
+        },
+
+        isBackendFieldExpanded(index, key) {
+            return !!this.expandedBackendFields[index + '-' + key];
+        },
+
+        formatTraceBody(body) {
+            try {
+                const binaryString = atob(body);
+                const bytes = new Uint8Array(binaryString.length);
+                for (let i = 0; i < binaryString.length; i++) {
+                    bytes[i] = binaryString.charCodeAt(i);
+                }
+                const text = new TextDecoder().decode(bytes);
+                try {
+                    return JSON.stringify(JSON.parse(text), null, 2);
+                } catch {
+                    return text;
+                }
+            } catch {
+                return body || '';
+            }
+        },
+
+        formatLargeValue(value) {
            if (typeof value === 'string') {
                try {
-                    const parsed = JSON.parse(value);
-                    formatted = JSON.stringify(parsed, null, 2);
+                    return JSON.stringify(JSON.parse(value), null, 2);
                } catch {
-                    formatted = value;
+                    return value;
                }
-            } else if (typeof value === 'object') {
-                formatted = JSON.stringify(value, null, 2);
-            } else {
-                formatted = String(value);
            }
-            this.detailValue = formatted;
-
-            this.$nextTick(() => {
-                const el = document.getElementById('detailEditor');
-                if (el) {
-                    el.innerHTML = '';
-                    this.detailEditor = CodeMirror(el, {
-                        value: formatted,
-                        mode: 'javascript',
-                        json: true,
-                        theme: 'default',
-                        lineNumbers: true,
-                        readOnly: true,
-                        lineWrapping: true
-                    });
-                }
-            });
+            if (typeof value === 'object') {
+                return JSON.stringify(value, null, 2);
+            }
+            return String(value);
        },

        formatTimestamp(ts) {
@@ -623,12 +552,6 @@ function tracesApp() {
            if (typeof value === 'boolean') return value ? 'true' : 'false';
            if (typeof value === 'object') return JSON.stringify(value);
            return String(value);
-        },
-
-        getDataEntries(index) {
-            const trace = this.backendTraces[index];
-            if (!trace || !trace.data) return [];
-            return Object.entries(trace.data);
        }
    }
 }
--- a/core/schema/localai.go
+++ b/core/schema/localai.go
@@ -53,7 +53,8 @@ type TTSRequest struct {
 	Backend  string `json:"backend" yaml:"backend"`
 	Language string `json:"language,omitempty" yaml:"language,omitempty"`               // (optional) language to use with TTS model
 	Format   string `json:"response_format,omitempty" yaml:"response_format,omitempty"` // (optional) output format
-	Stream   bool   `json:"stream,omitempty" yaml:"stream,omitempty"`                   // (optional) enable streaming TTS
+	Stream     bool   `json:"stream,omitempty" yaml:"stream,omitempty"`                         // (optional) enable streaming TTS
+	SampleRate int    `json:"sample_rate,omitempty" yaml:"sample_rate,omitempty"`               // (optional) desired output sample rate
 }

 // @Description VAD request body
--- a/core/schema/openai.go
+++ b/core/schema/openai.go
@@ -115,7 +115,7 @@ const (
 	TranscriptionResponseFormatVtt         = TranscriptionResponseFormatType("vtt")
 	TranscriptionResponseFormatLrc         = TranscriptionResponseFormatType("lrc")
 	TranscriptionResponseFormatJson        = TranscriptionResponseFormatType("json")
-	TranscriptionResponseFormatJsonVerbose = TranscriptionResponseFormatType("json_verbose")
+	TranscriptionResponseFormatJsonVerbose = TranscriptionResponseFormatType("verbose_json")
 )

 type ChatCompletionResponseFormat struct {
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -20,10 +20,14 @@ services:
      - MODELS_PATH=/models
    #  - DEBUG=true
    volumes:
-      - ./models:/models:cached
-      - ./images/:/tmp/generated/images/
+      - models:/models
+      - images:/tmp/generated/images/
    command:
    # Here we can specify a list of models to run (see quickstart https://localai.io/basics/getting_started/#running-models )
    # or an URL pointing to a YAML configuration file, for example:
    # - https://gist.githubusercontent.com/mudler/ad601a0488b497b69ec549150d9edd18/raw/a8a8869ef1bb7e3830bf5c0bae29a0cce991ff8d/phi-2.yaml
    - phi-2
+
+volumes:
+  models:
+  images:
--- a/docs/content/features/GPU-acceleration.md
+++ b/docs/content/features/GPU-acceleration.md
@@ -60,6 +60,22 @@ diffusers:
  scheduler_type: "k_dpmpp_sde"
 ```

+### Multi-GPU Support
+
+For multi-GPU support with diffusers, you need to configure the model with `tensor_parallel_size` set to the number of GPUs you want to use.
+
+```yaml
+name: stable-diffusion-multigpu
+model: stabilityai/stable-diffusion-xl-base-1.0
+backend: diffusers
+parameters:
+  tensor_parallel_size: 2 # Number of GPUs to use
+```
+
+The `tensor_parallel_size` parameter is set in the gRPC proto configuration (in `ModelOptions` message, field 55). When this is set to a value greater than 1, the diffusers backend automatically enables `device_map="auto"` to distribute the model across multiple GPUs.
+
+When using diffusers with multiple GPUs, ensure you have sufficient GPU memory across all devices. The model will be automatically distributed across available GPUs. For optimal performance, use GPUs of the same type and memory capacity.
+
 ## CUDA(NVIDIA) acceleration

 ### Requirements
--- a/docs/content/features/audio-to-text.md
+++ b/docs/content/features/audio-to-text.md
@@ -57,7 +57,7 @@ Result:

 ---

-You can also specify the `response_format` parameter to be one of `lrc`, `srt`, `vtt`, `text`, `json` or `json_verbose` (default):
+You can also specify the `response_format` parameter to be one of `lrc`, `srt`, `vtt`, `text`, `json` or `verbose_json` (default):
 ```bash
 ## Send the example audio file to the transcriptions endpoint
 curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1" -F response_format="srt"
--- a/docs/content/installation/_index.en.md
+++ b/docs/content/installation/_index.en.md
@@ -8,34 +8,32 @@ icon: download

 LocalAI can be installed in multiple ways depending on your platform and preferences.

-{{% notice tip %}}
-**Recommended: Docker Installation**
-
-**Docker is the recommended installation method** for most users as it works across all platforms (Linux, macOS, Windows) and provides the easiest setup experience. It's the fastest way to get started with LocalAI.
-{{% /notice %}}
-
 ## Installation Methods

 Choose the installation method that best suits your needs:

-1. **[Docker](docker/)** ⭐ **Recommended** - Works on all platforms, easiest setup
+1. **[Containers](containers/)** ⭐ **Recommended** - Works on all platforms, supports Docker and Podman
 2. **[macOS](macos/)** - Download and install the DMG application
-3. **[Linux](linux/)** - Install on Linux using binaries (install.sh script currently has issues - see [issue #8032](https://github.com/mudler/LocalAI/issues/8032))
+3. **[Linux](linux/)** - Install on Linux using binaries
 4. **[Kubernetes](kubernetes/)** - Deploy LocalAI on Kubernetes clusters
 5. **[Build from Source](build/)** - Build LocalAI from source code

 ## Quick Start

-**Recommended: Docker (works on all platforms)**
+**Recommended: Containers (Docker or Podman)**

 ```bash
+# With Docker
 docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
+
+# Or with Podman
+podman run -p 8080:8080 --name local-ai -ti localai/localai:latest
 ```

 This will start LocalAI. The API will be available at `http://localhost:8080`. For images with pre-configured models, see [All-in-One images](/getting-started/container-images/#all-in-one-images).

 For other platforms:
 - **macOS**: Download the [DMG](macos/)
- **Linux**: See the [Linux installation guide](linux/) for installation options. **Note:** The `install.sh` script is currently experiencing issues - see [issue #8032](https://github.com/mudler/LocalAI/issues/8032) for details.
+- **Linux**: See the [Linux installation guide](linux/) for binary installation.

-For detailed instructions, see the [Docker installation guide](docker/).
+For detailed instructions, see the [Containers installation guide](containers/).
--- a/docs/content/installation/containers.md
+++ b/docs/content/installation/containers.md
@@ -0,0 +1,258 @@
+---
+title: Containers
+description: Install and use LocalAI with container engines (Docker, Podman)
+weight: 1
+url: '/installation/containers/'
+---
+
+LocalAI supports Docker, Podman, and other OCI-compatible container engines. This guide covers the common aspects of running LocalAI in containers.
+
+## Prerequisites
+
+Before you begin, ensure you have a container engine installed:
+
+- [Install Docker](https://docs.docker.com/get-docker/) (Mac, Windows, Linux)
+- [Install Podman](https://podman.io/getting-started/installation) (Linux, macOS, Windows WSL2)
+
+## Quick Start
+
+The fastest way to get started is with the CPU image:
+
+```bash
+docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
+# Or with Podman:
+podman run -p 8080:8080 --name local-ai -ti localai/localai:latest
+```
+
+This will:
+- Start LocalAI (you'll need to install models separately)
+- Make the API available at `http://localhost:8080`
+
+## Image Types
+
+LocalAI provides several image types to suit different needs. These images work with both Docker and Podman.
+
+### Standard Images
+
+Standard images don't include pre-configured models. Use these if you want to configure models manually.
+
+#### CPU Image
+
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 localai/localai:latest
+```
+
+#### GPU Images
+
+**NVIDIA CUDA 13:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-13
+```
+
+**NVIDIA CUDA 12:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-gpu-nvidia-cuda-12
+```
+
+**AMD GPU (ROCm):**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-gpu-hipblas
+```
+
+**Intel GPU:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-gpu-intel
+```
+
+**Vulkan:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
+```
+
+**NVIDIA Jetson (L4T ARM64):**
+
+CUDA 12 (for Nvidia AGX Orin and similar platforms):
+```bash
+docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64
+```
+
+CUDA 13 (for Nvidia DGX Spark):
+```bash
+docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
+```
+
+### All-in-One (AIO) Images
+
+**Recommended for beginners** - These images come pre-configured with models and backends, ready to use immediately.
+
+#### CPU Image
+
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
+```
+
+#### GPU Images
+
+**NVIDIA CUDA 13:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-13
+```
+
+**NVIDIA CUDA 12:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device nvidia.com/gpu=all localai/localai:latest-aio-gpu-nvidia-cuda-12
+```
+
+**AMD GPU (ROCm):**
+```bash
+docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device rocm.com/gpu=all localai/localai:latest-aio-gpu-hipblas
+```
+
+**Intel GPU:**
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 --device gpu.intel.com/all localai/localai:latest-aio-gpu-intel
+```
+
+## Using Compose
+
+For a more manageable setup, especially with persistent volumes, use Docker Compose or Podman Compose:
+
+```yaml
+version: "3.9"
+services:
+  api:
+    image: localai/localai:latest-aio-cpu
+    # For GPU support, use one of:
+    # image: localai/localai:latest-aio-gpu-nvidia-cuda-13
+    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
+    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
+    # image: localai/localai:latest-aio-gpu-hipblas
+    # image: localai/localai:latest-aio-gpu-intel
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
+      interval: 1m
+      timeout: 20m
+      retries: 5
+    ports:
+      - 8080:8080
+    environment:
+      - DEBUG=false
+    volumes:
+      - ./models:/models:cached
+    # For NVIDIA GPUs, uncomment:
+    # deploy:
+    #   resources:
+    #     reservations:
+    #       devices:
+    #         - driver: nvidia
+    #           count: 1
+    #           capabilities: [gpu]
+```
+
+Save this as `compose.yaml` and run:
+
+```bash
+docker compose up -d
+# Or with Podman:
+podman-compose up -d
+```
+
+## Persistent Storage
+
+To persist models and configurations, mount a volume:
+
+```bash
+docker run -ti --name local-ai -p 8080:8080 \
+  -v $PWD/models:/models \
+  localai/localai:latest-aio-cpu
+# Or with Podman:
+podman run -ti --name local-ai -p 8080:8080 \
+  -v $PWD/models:/models \
+  localai/localai:latest-aio-cpu
+```
+
+Or use a named volume:
+
+```bash
+docker volume create localai-models
+docker run -ti --name local-ai -p 8080:8080 \
+  -v localai-models:/models \
+  localai/localai:latest-aio-cpu
+# Or with Podman:
+podman volume create localai-models
+podman run -ti --name local-ai -p 8080:8080 \
+  -v localai-models:/models \
+  localai/localai:latest-aio-cpu
+```
+
+## What's Included in AIO Images
+
+All-in-One images come pre-configured with:
+
+- **Text Generation**: LLM models for chat and completion
+- **Image Generation**: Stable Diffusion models
+- **Text to Speech**: TTS models
+- **Speech to Text**: Whisper models
+- **Embeddings**: Vector embedding models
+- **Function Calling**: Support for OpenAI-compatible function calling
+
+The AIO images use OpenAI-compatible model names (like `gpt-4`, `gpt-4-vision-preview`) but are backed by open-source models. See the [container images documentation](/getting-started/container-images/#all-in-one-images) for the complete mapping.
+
+## Next Steps
+
+After installation:
+
+1. Access the WebUI at `http://localhost:8080`
+2. Check available models: `curl http://localhost:8080/v1/models`
+3. [Install additional models](/getting-started/models/)
+4. [Try out examples](/getting-started/try-it-out/)
+
+## Troubleshooting
+
+### Container won't start
+
+- Check container engine is running: `docker ps` or `podman ps`
+- Check port 8080 is available: `netstat -an | grep 8080` (Linux/Mac)
+- View logs: `docker logs local-ai` or `podman logs local-ai`
+
+### GPU not detected
+
+- Ensure Docker has GPU access: `docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi`
+- For Podman, see the [Podman installation guide](/installation/podman/#gpu-not-detected)
+- For NVIDIA: Install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
+- For AMD: Ensure devices are accessible: `ls -la /dev/kfd /dev/dri`
+
+### Models not downloading
+
+- Check internet connection
+- Verify disk space: `df -h`
+- Check container logs for errors: `docker logs local-ai` or `podman logs local-ai`
+
+## See Also
+
+- [Container Images Reference](/getting-started/container-images/) - Complete image reference
+- [Install Models](/getting-started/models/) - Install and configure models
+- [GPU Acceleration](/features/gpu-acceleration/) - GPU setup and optimization
+- [Kubernetes Installation](/installation/kubernetes/) - Deploy on Kubernetes
--- a/docs/content/installation/docker.md
+++ b/docs/content/installation/docker.md
@@ -1,249 +1,9 @@
 ---
 title: "Docker Installation"
 description: "Install LocalAI using Docker containers - the recommended installation method"
-weight: 1
+weight: 2
 url: '/installation/docker/'
+redirectURI: '/installation/containers/'
 ---

-{{% notice tip %}}
-**Recommended Installation Method**
-
-Docker is the recommended way to install LocalAI and provides the easiest setup experience.
-{{% /notice %}}
-
-LocalAI provides Docker images that work with Docker, Podman, and other container engines. These images are available on [Docker Hub](https://hub.docker.com/r/localai/localai) and [Quay.io](https://quay.io/repository/go-skynet/local-ai).
-
-## Prerequisites
-
-Before you begin, ensure you have Docker or Podman installed:
-
- [Install Docker Desktop](https://docs.docker.com/get-docker/) (Mac, Windows, Linux)
- [Install Podman](https://podman.io/getting-started/installation) (Linux alternative)
- [Install Docker Engine](https://docs.docker.com/engine/install/) (Linux servers)
-
-## Quick Start
-
-The fastest way to get started is with the CPU image:
-
-```bash
-docker run -p 8080:8080 --name local-ai -ti localai/localai:latest
-```
-
-This will:
- Start LocalAI (you'll need to install models separately)
- Make the API available at `http://localhost:8080`
-
-{{% notice tip %}}
-**Docker Run vs Docker Start**
-
- `docker run` creates and starts a new container. If a container with the same name already exists, this command will fail.
- `docker start` starts an existing container that was previously created with `docker run`.
-
-If you've already run LocalAI before and want to start it again, use: `docker start -i local-ai`
-{{% /notice %}}
-
-## Image Types
-
-LocalAI provides several image types to suit different needs:
-
-### Standard Images
-
-Standard images don't include pre-configured models. Use these if you want to configure models manually.
-
-#### CPU Image
-
-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
-```
-
-#### GPU Images
-
-**NVIDIA CUDA 13:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
-```
-
-**NVIDIA CUDA 12:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
-```
-
-**AMD GPU (ROCm):**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
-```
-
-**Intel GPU:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel
-```
-
-**Vulkan:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
-```
-
-**NVIDIA Jetson (L4T ARM64):**
-
-CUDA 12 (for Nvidia AGX Orin and similar platforms):
-```bash
-docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64
-```
-
-CUDA 13 (for Nvidia DGX Spark):
-```bash
-docker run -ti --name local-ai -p 8080:8080 --runtime nvidia --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
-```
-
-### All-in-One (AIO) Images
-
-**Recommended for beginners** - These images come pre-configured with models and backends, ready to use immediately.
-
-#### CPU Image
-
-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
-```
-
-#### GPU Images
-
-**NVIDIA CUDA 13:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13
-```
-
-**NVIDIA CUDA 12:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
-```
-
-**AMD GPU (ROCm):**
-```bash
-docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas
-```
-
-**Intel GPU:**
-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel
-```
-
-## Using Docker Compose
-
-For a more manageable setup, especially with persistent volumes, use Docker Compose:
-
-```yaml
-version: "3.9"
-services:
-  api:
-    image: localai/localai:latest-aio-cpu
-    # For GPU support, use one of:
-    # image: localai/localai:latest-aio-gpu-nvidia-cuda-13
-    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
-    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
-    # image: localai/localai:latest-aio-gpu-hipblas
-    # image: localai/localai:latest-aio-gpu-intel
-    healthcheck:
-      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
-      interval: 1m
-      timeout: 20m
-      retries: 5
-    ports:
-      - 8080:8080
-    environment:
-      - DEBUG=false
-    volumes:
-      - ./models:/models:cached
-    # For NVIDIA GPUs, uncomment:
-    # deploy:
-    #   resources:
-    #     reservations:
-    #       devices:
-    #         - driver: nvidia
-    #           count: 1
-    #           capabilities: [gpu]
-```
-
-Save this as `docker-compose.yml` and run:
-
-```bash
-docker compose up -d
-```
-
-## Persistent Storage
-
-To persist models and configurations, mount a volume:
-
-```bash
-docker run -ti --name local-ai -p 8080:8080 \
-  -v $PWD/models:/models \
-  localai/localai:latest-aio-cpu
-```
-
-Or use a named volume:
-
-```bash
-docker volume create localai-models
-docker run -ti --name local-ai -p 8080:8080 \
-  -v localai-models:/models \
-  localai/localai:latest-aio-cpu
-```
-
-## What's Included in AIO Images
-
-All-in-One images come pre-configured with:
-
- **Text Generation**: LLM models for chat and completion
- **Image Generation**: Stable Diffusion models
- **Text to Speech**: TTS models
- **Speech to Text**: Whisper models
- **Embeddings**: Vector embedding models
- **Function Calling**: Support for OpenAI-compatible function calling
-
-The AIO images use OpenAI-compatible model names (like `gpt-4`, `gpt-4-vision-preview`) but are backed by open-source models. See the [container images documentation](/getting-started/container-images/#all-in-one-images) for the complete mapping.
-
-## Next Steps
-
-After installation:
-
-1. Access the WebUI at `http://localhost:8080`
-2. Check available models: `curl http://localhost:8080/v1/models`
-3. [Install additional models](/getting-started/models/)
-4. [Try out examples](/getting-started/try-it-out/)
-
-## Advanced Configuration
-
-For detailed information about:
- All available image tags and versions
- Advanced Docker configuration options
- Custom image builds
- Backend management
-
-See the [Container Images documentation](/getting-started/container-images/).
-
-## Troubleshooting
-
-### Container won't start
-
- Check Docker is running: `docker ps`
- Check port 8080 is available: `netstat -an | grep 8080` (Linux/Mac)
- View logs: `docker logs local-ai`
-
-### GPU not detected
-
- Ensure Docker has GPU access: `docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi`
- For NVIDIA: Install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
- For AMD: Ensure devices are accessible: `ls -la /dev/kfd /dev/dri`
-
-### Models not downloading
-
- Check internet connection
- Verify disk space: `df -h`
- Check Docker logs for errors: `docker logs local-ai`
-
-## See Also
-
- [Container Images Reference](/getting-started/container-images/) - Complete image reference
- [Install Models](/getting-started/models/) - Install and configure models
- [GPU Acceleration](/features/gpu-acceleration/) - GPU setup and optimization
- [Kubernetes Installation](/installation/kubernetes/) - Deploy on Kubernetes
-
+See [Containers](/installation/containers/) for the complete guide to running LocalAI with Docker and Podman.
--- a/docs/content/installation/linux.md
+++ b/docs/content/installation/linux.md
@@ -1,69 +1,10 @@
 ---
 title: "Linux Installation"
-description: "Install LocalAI on Linux using the installer script or binaries"
+description: "Install LocalAI on Linux using binaries"
 weight: 3
 url: '/installation/linux/'
 ---

-
-## One-Line Installer
-
-{{% notice warning %}}
-**The `install.sh` script is currently experiencing issues and may produce broken or misconfigured installations. Please use alternative installation methods (Docker or manual binary installation) until [issue #8032](https://github.com/mudler/LocalAI/issues/8032) is resolved.**
-{{% /notice %}}
-
-The fastest way to install LocalAI on Linux is with the installation script:
-
-```bash
-curl https://localai.io/install.sh | sh
-```
-
-This script will:
- Detect your system architecture
- Download the appropriate LocalAI binary
- Set up the necessary configuration
- Start LocalAI automatically
-
-### Installer Configuration Options
-
-The installer can be configured using environment variables:
-
-```bash
-curl https://localai.io/install.sh | VAR=value sh
-```
-
-#### Environment Variables
-
-| Environment Variable | Description |
-|----------------------|-------------|
-| **DOCKER_INSTALL** | Set to `"true"` to enable the installation of Docker images |
-| **USE_AIO** | Set to `"true"` to use the all-in-one LocalAI Docker image |
-| **USE_VULKAN** | Set to `"true"` to use Vulkan GPU support |
-| **API_KEY** | Specify an API key for accessing LocalAI, if required |
-| **PORT** | Specifies the port on which LocalAI will run (default is 8080) |
-| **THREADS** | Number of processor threads the application should use. Defaults to the number of logical cores minus one |
-| **VERSION** | Specifies the version of LocalAI to install. Defaults to the latest available version |
-| **MODELS_PATH** | Directory path where LocalAI models are stored (default is `/var/lib/local-ai/models`) |
-| **P2P_TOKEN** | Token to use for the federation or for starting workers. See [distributed inferencing documentation]({{%relref "features/distributed_inferencing" %}}) |
-| **WORKER** | Set to `"true"` to make the instance a worker (p2p token is required) |
-| **FEDERATED** | Set to `"true"` to share the instance with the federation (p2p token is required) |
-| **FEDERATED_SERVER** | Set to `"true"` to run the instance as a federation server which forwards requests to the federation (p2p token is required) |
-
-#### Image Selection
-
-The installer will automatically detect your GPU and select the appropriate image. By default, it uses the standard images without extra Python dependencies. You can customize the image selection:
-
- `USE_AIO=true`: Use all-in-one images that include all dependencies
- `USE_VULKAN=true`: Use Vulkan GPU support instead of vendor-specific GPU support
-
-#### Uninstallation
-
-To uninstall LocalAI installed via the script:
-
-```bash
-curl https://localai.io/install.sh | sh -s -- --uninstall
-```
-
 ## Manual Installation

 ### Download Binary
--- a/docs/content/reference/cli-reference.md
+++ b/docs/content/reference/cli-reference.md
@@ -46,6 +46,7 @@ Complete reference for all LocalAI command-line interface (CLI) parameters and e
 | `--watchdog-idle-timeout` | `15m` | Threshold beyond which an idle backend should be stopped | `$LOCALAI_WATCHDOG_IDLE_TIMEOUT`, `$WATCHDOG_IDLE_TIMEOUT` |
 | `--enable-watchdog-busy` | `false` | Enable watchdog for stopping backends that are busy longer than the watchdog-busy-timeout | `$LOCALAI_WATCHDOG_BUSY`, `$WATCHDOG_BUSY` |
 | `--watchdog-busy-timeout` | `5m` | Threshold beyond which a busy backend should be stopped | `$LOCALAI_WATCHDOG_BUSY_TIMEOUT`, `$WATCHDOG_BUSY_TIMEOUT` |
+| `--watchdog-interval` | `500ms` | Interval between watchdog checks (e.g., `500ms`, `5s`, `1m`) | `$LOCALAI_WATCHDOG_INTERVAL`, `$WATCHDOG_INTERVAL` |
 | `--force-eviction-when-busy` | `false` | Force eviction even when models have active API calls (default: false for safety). **Warning:** Enabling this can interrupt active requests | `$LOCALAI_FORCE_EVICTION_WHEN_BUSY`, `$FORCE_EVICTION_WHEN_BUSY` |
 | `--lru-eviction-max-retries` | `30` | Maximum number of retries when waiting for busy models to become idle before eviction | `$LOCALAI_LRU_EVICTION_MAX_RETRIES`, `$LRU_EVICTION_MAX_RETRIES` |
 | `--lru-eviction-retry-interval` | `1s` | Interval between retries when waiting for busy models to become idle (e.g., `1s`, `2s`) | `$LOCALAI_LRU_EVICTION_RETRY_INTERVAL`, `$LRU_EVICTION_RETRY_INTERVAL` |
--- a/docs/data/version.json
+++ b/docs/data/version.json
@@ -1,3 +1,3 @@
 {
-  "version": "v3.12.0"
+  "version": "v3.12.1"
 }
--- a/docs/static/install.sh
+++ b/docs/static/install.sh
@@ -1,922 +0,0 @@
-#!/bin/sh
-# LocalAI Installer Script
-# This script installs LocalAI on Linux and macOS systems.
-# It automatically detects the system architecture and installs the appropriate version.
-
-# Usage:
-#   Basic installation:
-#     curl https://localai.io/install.sh | sh
-#
-#   With environment variables:
-#     DOCKER_INSTALL=true USE_AIO=true API_KEY=your-key PORT=8080 THREADS=4 curl https://localai.io/install.sh | sh
-#
-#   To uninstall:
-#     curl https://localai.io/install.sh | sh -s -- --uninstall
-#
-# Environment Variables:
-#   DOCKER_INSTALL - Set to "true" to install Docker images (default: auto-detected)
-#   USE_AIO       - Set to "true" to use the all-in-one LocalAI image (default: false)
-#   USE_VULKAN    - Set to "true" to use Vulkan GPU support (default: false)
-#   API_KEY       - API key for securing LocalAI access (default: none)
-#   PORT          - Port to run LocalAI on (default: 8080)
-#   THREADS       - Number of CPU threads to use (default: auto-detected)
-#   MODELS_PATH   - Path to store models (default: /var/lib/local-ai/models)
-#   CORE_IMAGES   - Set to "true" to download core LocalAI images (default: false)
-#   P2P_TOKEN     - Token for P2P federation/worker mode (default: none)
-#   WORKER        - Set to "true" to run as a worker node (default: false)
-#   FEDERATED     - Set to "true" to enable federation mode (default: false)
-#   FEDERATED_SERVER - Set to "true" to run as a federation server (default: false)
-
-set -e
-set -o noglob
-#set -x
-
-# --- helper functions for logs ---
-# ANSI escape codes
-LIGHT_BLUE='\033[38;5;117m'
-ORANGE='\033[38;5;214m'
-RED='\033[38;5;196m'
-BOLD='\033[1m'
-RESET='\033[0m'
-
-ECHO=`which echo || true`
-if [ -z "$ECHO" ]; then
-    ECHO=echo
-else
-    ECHO="$ECHO -e"
-fi
-
-info()
-{
-    ${ECHO} "${BOLD}${LIGHT_BLUE}" '[INFO] ' "$@" "${RESET}"
-}
-
-warn()
-{
-    ${ECHO} "${BOLD}${ORANGE}" '[WARN] ' "$@" "${RESET}" >&2
-}
-
-fatal()
-{
-    ${ECHO} "${BOLD}${RED}" '[ERROR] ' "$@" "${RESET}" >&2
-    exit 1
-}
-
-# --- custom choice functions ---
-# like the logging functions, but with the -n flag to prevent the new line and keep the cursor in line for choices inputs like y/n
-choice_info()
-{
-    ${ECHO} -n "${BOLD}${LIGHT_BLUE}" '[INFO] ' "$@" "${RESET}"
-}
-
-choice_warn()
-{
-    ${ECHO} -n "${BOLD}${ORANGE}" '[WARN] ' "$@" "${RESET}" >&2
-}
-
-choice_fatal()
-{
-    ${ECHO} -n "${BOLD}${RED}" '[ERROR] ' "$@" "${RESET}" >&2
-    exit 1
-}
-
-# --- fatal if no systemd or openrc ---
-verify_system() {
-    if [ -x /sbin/openrc-run ]; then
-        HAS_OPENRC=true
-        return
-    fi
-    if [ -x /bin/systemctl ] || type systemctl > /dev/null 2>&1; then
-        HAS_SYSTEMD=true
-        return
-    fi
-    fatal 'Can not find systemd or openrc to use as a process supervisor for local-ai.'
-}
-
-TEMP_DIR=$(mktemp -d)
-cleanup() { rm -rf $TEMP_DIR; }
-trap cleanup EXIT
-
-available() { command -v $1 >/dev/null; }
-require() {
-    local MISSING=''
-    for TOOL in $*; do
-        if ! available $TOOL; then
-            MISSING="$MISSING $TOOL"
-        fi
-    done
-
-    echo $MISSING
-}
-
-# Function to uninstall LocalAI
-uninstall_localai() {
-    info "Starting LocalAI uninstallation..."
-
-    # Stop and remove Docker container if it exists
-    if available docker && $SUDO docker ps -a --format '{{.Names}}' | grep -q local-ai; then
-        info "Stopping and removing LocalAI Docker container..."
-        $SUDO docker stop local-ai || true
-        $SUDO docker rm local-ai || true
-        $SUDO docker volume rm local-ai-data || true
-    fi
-
-    # Remove systemd service if it exists
-    if [ -f "/etc/systemd/system/local-ai.service" ]; then
-        info "Removing systemd service..."
-        $SUDO systemctl stop local-ai || true
-        $SUDO systemctl disable local-ai || true
-        $SUDO rm -f /etc/systemd/system/local-ai.service
-        $SUDO systemctl daemon-reload
-    fi
-
-    # Remove environment file
-    if [ -f "/etc/localai.env" ]; then
-        info "Removing environment file..."
-        $SUDO rm -f /etc/localai.env
-    fi
-
-    # Remove binary
-    for BINDIR in /usr/local/bin /usr/bin /bin; do
-        if [ -f "$BINDIR/local-ai" ]; then
-            info "Removing binary from $BINDIR..."
-            $SUDO rm -f "$BINDIR/local-ai"
-        fi
-    done
-
-    # Remove local-ai user and all its data if it exists
-    if id local-ai >/dev/null 2>&1; then
-        info "Removing local-ai user and all its data..."
-        $SUDO gpasswd -d $(whoami) local-ai
-        $SUDO userdel -r local-ai || true
-    fi
-
-    info "LocalAI has been successfully uninstalled."
-    exit 0
-}
-
-
-
-## VARIABLES
-
-# DOCKER_INSTALL - set to "true" to install Docker images
-# USE_AIO - set to "true" to install the all-in-one LocalAI image
-# USE_VULKAN - set to "true" to use Vulkan GPU support
-PORT=${PORT:-8080}
-
-docker_found=false
-if available docker ; then
-    info "Docker detected."
-    docker_found=true
-    if [ -z $DOCKER_INSTALL ]; then
-        info "Docker detected and no installation method specified. Using Docker."
-    fi
-fi
-
-DOCKER_INSTALL=${DOCKER_INSTALL:-$docker_found}
-USE_AIO=${USE_AIO:-false}
-USE_VULKAN=${USE_VULKAN:-false}
-API_KEY=${API_KEY:-}
-CORE_IMAGES=${CORE_IMAGES:-false}
-P2P_TOKEN=${P2P_TOKEN:-}
-WORKER=${WORKER:-false}
-FEDERATED=${FEDERATED:-false}
-FEDERATED_SERVER=${FEDERATED_SERVER:-false}
-
-# nprocs -1
-if available nproc; then
-    procs=$(nproc)
-else
-    procs=1
-fi
-THREADS=${THREADS:-$procs}
-LATEST_VERSION=$(curl -s "https://api.github.com/repos/mudler/LocalAI/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
-LOCALAI_VERSION="${LOCALAI_VERSION:-$LATEST_VERSION}" #changed due to VERSION beign already defined in Fedora 42 Cloud Edition
-MODELS_PATH=${MODELS_PATH:-/var/lib/local-ai/models}
-
-
-check_gpu() {
-    # Look for devices based on vendor ID for NVIDIA and AMD
-    case $1 in
-        lspci)
-            case $2 in
-                nvidia) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;;
-                amdgpu) available lspci && lspci -d '1002:' | grep -q 'AMD' || return 1 ;;
-                intel) available lspci && lspci | grep -E 'VGA|3D' | grep -iq intel | return 1 ;;
-            esac ;;
-        lshw)
-            case $2 in
-                nvidia) available lshw && $SUDO lshw -c display -numeric | grep -q 'vendor: .* \[10DE\]' || return 1 ;;
-                amdgpu) available lshw && $SUDO lshw -c display -numeric | grep -q 'vendor: .* \[1002\]' || return 1 ;;
-                intel) available lshw  && $SUDO lshw -c display -numeric | grep -q 'vendor: .* \[8086\]' || return 1 ;;
-            esac ;;
-        nvidia-smi) available nvidia-smi || return 1 ;;
-    esac
-}
-
-
-install_success() {
-    info "The LocalAI API is now available at 127.0.0.1:$PORT."
-    if [ "$DOCKER_INSTALL" = "true" ]; then
-        info "The LocalAI Docker container is now running."
-    else
-        info 'Install complete. Run "local-ai" from the command line.'
-    fi
-}
-
-aborted() {
-    warn 'Installation aborted.'
-    exit 1
-}
-
-trap aborted INT
-
-configure_systemd() {
-    if ! id local-ai >/dev/null 2>&1; then
-        info "Creating local-ai user..."
-        $SUDO useradd -r -s /bin/false -U -M -d /var/lib/local-ai local-ai
-        $SUDO mkdir -p /var/lib/local-ai
-        $SUDO chmod 0755 /var/lib/local-ai
-        $SUDO chown local-ai:local-ai /var/lib/local-ai
-    fi
-
-    info "Adding current user to local-ai group..."
-    $SUDO usermod -a -G local-ai $(whoami)
-    info "Creating local-ai systemd service..."
-    cat <<EOF | $SUDO tee /etc/systemd/system/local-ai.service >/dev/null
-[Unit]
-Description=LocalAI Service
-After=network-online.target
-
-[Service]
-ExecStart=$BINDIR/local-ai $STARTCOMMAND
-User=local-ai
-Group=local-ai
-Restart=always
-EnvironmentFile=/etc/localai.env
-RestartSec=3
-Environment="PATH=$PATH"
-WorkingDirectory=/var/lib/local-ai
-
-[Install]
-WantedBy=default.target
-EOF
-
-    $SUDO touch /etc/localai.env
-    $SUDO echo "ADDRESS=0.0.0.0:$PORT" | $SUDO tee /etc/localai.env >/dev/null
-    $SUDO echo "API_KEY=$API_KEY" | $SUDO tee -a /etc/localai.env >/dev/null
-    $SUDO echo "THREADS=$THREADS" | $SUDO tee -a /etc/localai.env >/dev/null
-    $SUDO echo "MODELS_PATH=$MODELS_PATH" | $SUDO tee -a /etc/localai.env >/dev/null
-
-    if [ -n "$P2P_TOKEN" ]; then
-        $SUDO echo "LOCALAI_P2P_TOKEN=$P2P_TOKEN" | $SUDO tee -a /etc/localai.env >/dev/null
-        $SUDO echo "LOCALAI_P2P=true" | $SUDO tee -a /etc/localai.env >/dev/null
-    fi
-
-    if [ "$LOCALAI_P2P_DISABLE_DHT" = true ]; then
-        $SUDO echo "LOCALAI_P2P_DISABLE_DHT=true" | $SUDO tee -a /etc/localai.env >/dev/null
-    fi
-
-    SYSTEMCTL_RUNNING="$(systemctl is-system-running || true)"
-    case $SYSTEMCTL_RUNNING in
-        running|degraded)
-            info "Enabling and starting local-ai service..."
-            $SUDO systemctl daemon-reload
-            $SUDO systemctl enable local-ai
-
-            start_service() { $SUDO systemctl restart local-ai; }
-            trap start_service EXIT
-            ;;
-    esac
-}
-
-
-
-# ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf
-install_container_toolkit_yum() {
-    info 'Installing NVIDIA container toolkit repository...'
-
-    curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
-    $SUDO  tee /etc/yum.repos.d/nvidia-container-toolkit.repo
-
-    if [ "$PACKAGE_MANAGER" = "dnf" ]; then
-        DNF_VERSION=$($PACKAGE_MANAGER --version | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -n1 | cut -d. -f1)
-        if [ "$DNF_VERSION" -ge 5 ]; then
-            # DNF5: Use 'setopt' to enable the repository
-            $SUDO $PACKAGE_MANAGER config-manager setopt nvidia-container-toolkit-experimental.enabled=1
-        else
-            # DNF4: Use '--set-enabled' to enable the repository
-            $SUDO $PACKAGE_MANAGER config-manager --enable nvidia-container-toolkit-experimental
-        fi
-    else
-        $SUDO $PACKAGE_MANAGER -y install yum-utils
-        $SUDO $PACKAGE_MANAGER-config-manager --enable nvidia-container-toolkit-experimental
-    fi
-    $SUDO $PACKAGE_MANAGER install -y nvidia-container-toolkit
-}
-
-# Fedora, Rhel and other distro ships tunable SELinux booleans in the container-selinux policy to control device access.
-# In particular, enabling container_use_devices allows containers to use arbitrary host device labels (including GPU devices)
-# ref: https://github.com/containers/ramalama/blob/main/docs/ramalama-cuda.7.md#expected-output
-enable_selinux_container_booleans() {
-
-    # Check SELinux mode
-    SELINUX_MODE=$(getenforce)
-
-    if [ "$SELINUX_MODE" == "Enforcing" ]; then
-        # Check the status of container_use_devices
-        CONTAINER_USE_DEVICES=$(getsebool container_use_devices | awk '{print $3}')
-
-       if [ "$CONTAINER_USE_DEVICES" == "off" ]; then
-
-          #We want to give the user the choice to enable the SE booleans since it is a security config
-          warn "+-----------------------------------------------------------------------------------------------------------+"
-          warn "| WARNING:                                                                                                  |"
-          warn "| Your distribution ships tunable SELinux booleans in the container-selinux policy to control device access.|"
-          warn "| In particular, enabling \"container_use_devices\" allows containers to use arbitrary host device labels   |"
-          warn "| (including GPU devices).                                                                                  |"
-          warn "| This script can try to enable them enabling the \"container_use_devices\" flag.                           |"
-          warn "|                                                                                                           |"
-          warn "| Otherwise you can exit the install script and enable them yourself.                                       |"
-          warn "+-----------------------------------------------------------------------------------------------------------+"
-
-          while true; do
-              choice_warn "I understand that this script is going to change my SELinux configs, which is a security risk: (yes/exit) ";
-              read  Answer
-
-              if [ "$Answer" = "yes" ]; then
-                warn "Enabling \"container_use_devices\" persistently..."
-                $SUDO setsebool -P container_use_devices 1
-
-                break
-              elif [ "$Answer" = "exit" ]; then
-                  aborted
-              else
-                  warn "Invalid choice. Please enter 'yes' or 'exit'."
-              fi
-            done
-       fi
-    fi
-}
-
-# ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-apt
-install_container_toolkit_apt() {
-    info 'Installing NVIDIA container toolkit repository...'
-
-    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | $SUDO gpg --dearmor -o /etc/apt/trusted.gpg.d/nvidia-container-toolkit-keyring.gpg \
-  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
-    $SUDO tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
-
-    $SUDO apt-get update && $SUDO apt-get install -y nvidia-container-toolkit
-}
-
-# ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-zypper
-install_container_toolkit_zypper() {
-    info 'Installing NVIDIA zypper repository...'
-    $SUDO zypper ar https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
-    $SUDO zypper modifyrepo --enable nvidia-container-toolkit-experimental
-    $SUDO zypper --gpg-auto-import-keys install -y nvidia-container-toolkit
-}
-
-install_container_toolkit() {
-    if [ ! -f "/etc/os-release" ]; then
-        fatal "Unknown distribution. Skipping CUDA installation."
-    fi
-
-    ## Check if it's already installed
-    if check_gpu nvidia-smi && available nvidia-container-runtime; then
-        info "NVIDIA Container Toolkit already installed."
-        return
-    fi
-
-    . /etc/os-release
-
-    OS_NAME=$ID
-    OS_VERSION=$VERSION_ID
-
-    info "Installing NVIDIA Container Toolkit..."
-    case $OS_NAME in
-            amzn|fedora|rocky|centos|rhel) install_container_toolkit_yum ;;
-            debian|ubuntu) install_container_toolkit_apt ;;
-            opensuse*|suse*) install_container_toolkit_zypper ;;
-            *) echo "Could not install nvidia container toolkit - unknown OS" ;;
-    esac
-
-    # after installing the toolkit we need to add it to the docker runtimes, otherwise even with --gpu all
-    # the container would still run with runc and would not have access to nvidia-smi
-    # ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-docker
-    info "Adding NVIDIA Container Runtime to Docker runtimes..."
-    $SUDO nvidia-ctk runtime configure --runtime=docker
-
-    info "Restarting Docker Daemon"
-    $SUDO systemctl restart docker
-
-    # The NVML error arises because SELinux blocked the container's attempts to open the GPU devices or related libraries.
-    # Without relaxing SELinux for the container, GPU commands like nvidia-smi report "Insufficient Permissions"
-    # This has been noted in NVIDIA's documentation:
-    # ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.13.5/install-guide.html#id2
-    # ref: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html#nvml-insufficient-permissions-and-selinux
-    case $OS_NAME in
-            fedora|rhel|centos|rocky)
-                enable_selinux_container_booleans
-                ;;
-            opensuse-tumbleweed)
-                enable_selinux_container_booleans
-                ;;
-    esac
-}
-
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-7-centos-7
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-8-rocky-8
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#rhel-9-rocky-9
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#fedora
-install_cuda_driver_yum() {
-    info 'Installing NVIDIA CUDA repository...'
-    case $PACKAGE_MANAGER in
-        yum)
-            $SUDO $PACKAGE_MANAGER -y install yum-utils
-            $SUDO $PACKAGE_MANAGER-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-$1$2.repo
-            ;;
-        dnf)
-            DNF_VERSION=$($PACKAGE_MANAGER --version | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -n1 | cut -d. -f1)
-            if [ "$DNF_VERSION" -ge 5 ]; then
-                # DNF5: Use 'addrepo' to add the repository
-                $SUDO $PACKAGE_MANAGER config-manager addrepo --id=nvidia-cuda --set=name="nvidia-cuda" --set=baseurl="https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-$1$2.repo"
-            else
-                # DNF4: Use '--add-repo' to add the repository
-                $SUDO $PACKAGE_MANAGER config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-$1$2.repo
-            fi
-            ;;
-    esac
-
-    case $1 in
-        rhel)
-            info 'Installing EPEL repository...'
-            # EPEL is required for third-party dependencies such as dkms and libvdpau
-            $SUDO $PACKAGE_MANAGER -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-$2.noarch.rpm || true
-            ;;
-    esac
-
-    info 'Installing CUDA driver...'
-
-    if [ "$1" = 'centos' ] || [ "$1$2" = 'rhel7' ]; then
-        $SUDO $PACKAGE_MANAGER -y install nvidia-driver-latest-dkms
-    fi
-
-    $SUDO $PACKAGE_MANAGER -y install cuda-drivers
-}
-
-install_fedora_nvidia_kernel_drivers(){
-
-  #We want to give the user the choice to install the akmod kernel drivers or not, since it could break some setups
-  warn "+------------------------------------------------------------------------------------------------+"
-  warn "| WARNING:                                                                                       |"
-  warn "| Looks like the NVIDIA Kernel modules are not installed.                                        |"
-  warn "|                                                                                                |"
-  warn "| This script can try to install them using akmod-nvidia.                                        |"
-  warn "| - The script need the rpmfusion free and nonfree repos and will install them if not available. |"
-  warn "| - The akmod installation can sometimes inhibit the reboot command.                             |"
-  warn "|                                                                                                |"
-  warn "| Otherwise you can exit the install script and install them yourself.                           |"
-  warn "| NOTE: you will need to reboot after the installation.                                          |"
-  warn "+------------------------------------------------------------------------------------------------+"
-
-  while true; do
-    choice_warn "Do you wish for the script to try and install them? (akmod/exit) ";
-    read  Answer
-
-    if [ "$Answer" = "akmod" ]; then
-
-      DNF_VERSION=$($PACKAGE_MANAGER --version | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' | head -n1 | cut -d. -f1)
-
-      OS_NAME=$ID
-      OS_VERSION=$VERSION_ID
-      FREE_URL="https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-${OS_VERSION}.noarch.rpm"
-      NONFREE_URL="https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-${OS_VERSION}.noarch.rpm"
-
-      curl -LO "$FREE_URL"
-      curl -LO "$NONFREE_URL"
-
-      if [ "$DNF_VERSION" -ge 5 ]; then
-          # DNF5:
-          $SUDO $PACKAGE_MANAGER install -y "rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm" "rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm"
-          $SUDO $PACKAGE_MANAGER install -y akmod-nvidia
-      else
-          # DNF4:
-          $SUDO $PACKAGE_MANAGER install -y "rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm" "rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm"
-          $SUDO $PACKAGE_MANAGER install -y akmod-nvidia
-      fi
-
-      $SUDO rm "rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm"
-      $SUDO rm "rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm"
-
-      install_cuda_driver_yum $OS_NAME '41'
-
-      info "Nvidia driver installation complete, please reboot now and run the Install script again to complete the setup."
-      exit
-
-    elif [ "$Answer" = "exit" ]; then
-
-        aborted
-    else
-        warn "Invalid choice. Please enter 'akmod' or 'exit'."
-    fi
-  done
-}
-
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu
-# ref: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#debian
-install_cuda_driver_apt() {
-    info 'Installing NVIDIA CUDA repository...'
-    curl -fsSL -o $TEMP_DIR/cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/$1$2/$(uname -m)/cuda-keyring_1.1-1_all.deb
-
-    case $1 in
-        debian)
-            info 'Enabling contrib sources...'
-            $SUDO sed 's/main/contrib/' < /etc/apt/sources.list | $SUDO tee /etc/apt/sources.list.d/contrib.list > /dev/null
-            if [ -f "/etc/apt/sources.list.d/debian.sources" ]; then
-                $SUDO sed 's/main/contrib/' < /etc/apt/sources.list.d/debian.sources | $SUDO tee /etc/apt/sources.list.d/contrib.sources > /dev/null
-            fi
-            ;;
-    esac
-
-    info 'Installing CUDA driver...'
-    $SUDO dpkg -i $TEMP_DIR/cuda-keyring.deb
-    $SUDO apt-get update
-
-    [ -n "$SUDO" ] && SUDO_E="$SUDO -E" || SUDO_E=
-    DEBIAN_FRONTEND=noninteractive $SUDO_E apt-get -y install cuda-drivers -q
-}
-
-install_cuda() {
-    if [ ! -f "/etc/os-release" ]; then
-        fatal "Unknown distribution. Skipping CUDA installation."
-    fi
-
-    . /etc/os-release
-
-    OS_NAME=$ID
-    OS_VERSION=$VERSION_ID
-
-    if [ -z "$PACKAGE_MANAGER" ]; then
-        fatal "Unknown package manager. Skipping CUDA installation."
-    fi
-
-    if ! check_gpu nvidia-smi || [ -z "$(nvidia-smi | grep -o "CUDA Version: [0-9]*\.[0-9]*")" ]; then
-        case $OS_NAME in
-            centos|rhel) install_cuda_driver_yum 'rhel' $(echo $OS_VERSION | cut -d '.' -f 1) ;;
-            rocky) install_cuda_driver_yum 'rhel' $(echo $OS_VERSION | cut -c1) ;;
-            fedora) [ $OS_VERSION -lt '41' ] && install_cuda_driver_yum $OS_NAME $OS_VERSION || install_cuda_driver_yum $OS_NAME '41';;
-            amzn) install_cuda_driver_yum 'fedora' '37' ;;
-            debian) install_cuda_driver_apt $OS_NAME $OS_VERSION ;;
-            ubuntu) install_cuda_driver_apt $OS_NAME $(echo $OS_VERSION | sed 's/\.//') ;;
-            *) exit ;;
-        esac
-    fi
-
-    if ! lsmod | grep -q nvidia || ! lsmod | grep -q nvidia_uvm; then
-        KERNEL_RELEASE="$(uname -r)"
-        case $OS_NAME in
-            rocky) $SUDO $PACKAGE_MANAGER -y install kernel-devel kernel-headers ;;
-            centos|rhel|amzn) $SUDO $PACKAGE_MANAGER -y install kernel-devel-$KERNEL_RELEASE kernel-headers-$KERNEL_RELEASE ;;
-            fedora) $SUDO $PACKAGE_MANAGER -y install kernel-devel-$KERNEL_RELEASE ;;
-            debian|ubuntu) $SUDO apt-get -y install linux-headers-$KERNEL_RELEASE ;;
-            *) exit ;;
-        esac
-
-        NVIDIA_CUDA_VERSION=$($SUDO dkms info | awk -F: '/added/ { print $1 }')
-        if [ -n "$NVIDIA_CUDA_VERSION" ]; then
-            $SUDO dkms install $NVIDIA_CUDA_VERSION
-        fi
-
-        if lsmod | grep -q nouveau; then
-            info 'Reboot to complete NVIDIA CUDA driver install.'
-            exit 0
-        fi
-
-        $SUDO modprobe nvidia
-        $SUDO modprobe nvidia_uvm
-    fi
-
-    # make sure the NVIDIA modules are loaded on boot with nvidia-persistenced
-    if command -v nvidia-persistenced > /dev/null 2>&1; then
-        $SUDO touch /etc/modules-load.d/nvidia.conf
-        MODULES="nvidia nvidia-uvm"
-        for MODULE in $MODULES; do
-            if ! grep -qxF "$MODULE" /etc/modules-load.d/nvidia.conf; then
-                echo "$MODULE" | sudo tee -a /etc/modules-load.d/nvidia.conf > /dev/null
-            fi
-        done
-    fi
-
-    info "NVIDIA GPU ready."
-    install_success
-
-}
-
-install_amd() {
-    # Look for pre-existing ROCm v6 before downloading the dependencies
-    for search in "${HIP_PATH:-''}" "${ROCM_PATH:-''}" "/opt/rocm" "/usr/lib64"; do
-        if [ -n "${search}" ] && [ -e "${search}/libhipblas.so.2" -o -e "${search}/lib/libhipblas.so.2" ]; then
-            info "Compatible AMD GPU ROCm library detected at ${search}"
-            install_success
-            exit 0
-        fi
-    done
-
-    info "AMD GPU ready."
-    exit 0
-}
-
-install_docker() {
-    [ "$(uname -s)" = "Linux" ] || fatal 'This script is intended to run on Linux only.'
-
-    if ! available docker; then
-        info "Installing Docker..."
-        curl -fsSL https://get.docker.com | sh
-    fi
-
-    # Check docker is running
-    if ! $SUDO systemctl is-active --quiet docker; then
-        info "Starting Docker..."
-        $SUDO systemctl start docker
-    fi
-
-    info "Creating LocalAI Docker volume..."
-    # Create volume if doesn't exist already
-    if ! $SUDO docker volume inspect local-ai-data > /dev/null 2>&1; then
-        $SUDO docker volume create local-ai-data
-    fi
-
-    # Check if container is already running
-    if $SUDO docker ps -a --format '{{.Names}}' | grep -q local-ai; then
-        info "LocalAI Docker container already exists, replacing it..."
-        $SUDO docker rm -f local-ai
-    fi
-
-    envs=""
-    if [ -n "$P2P_TOKEN" ]; then
-        envs="-e LOCALAI_P2P_TOKEN=$P2P_TOKEN -e LOCALAI_P2P=true"
-    fi
-    if [ "$LOCALAI_P2P_DISABLE_DHT" = true ]; then
-        envs="$envs -e LOCALAI_P2P_DISABLE_DHT=true"
-    fi
-
-    IMAGE_TAG=
-    if [ "$USE_VULKAN" = true ]; then
-        IMAGE_TAG=${LOCALAI_VERSION}-gpu-vulkan
-
-        info "Starting LocalAI Docker container..."
-        $SUDO docker run -v local-ai-data:/models \
-            --device /dev/dri \
-            --restart=always \
-            -e API_KEY=$API_KEY \
-            -e THREADS=$THREADS \
-            $envs \
-            -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND
-    elif [ "$HAS_CUDA" ]; then
-        # Default to CUDA 12
-        IMAGE_TAG=${LOCALAI_VERSION}-gpu-nvidia-cuda-12
-        # AIO
-        if [ "$USE_AIO" = true ]; then
-            IMAGE_TAG=${LOCALAI_VERSION}-aio-gpu-nvidia-cuda-12
-        fi
-
-        info "Checking Nvidia Kernel Drivers presence..."
-        if ! available nvidia-smi; then
-          OS_NAME=$ID
-          OS_VERSION=$VERSION_ID
-
-            case $OS_NAME in
-                debian|ubuntu) $SUDO apt-get -y install nvidia-cuda-toolkit;;
-                fedora) install_fedora_nvidia_kernel_drivers;;
-            esac
-        fi
-
-        info "Starting LocalAI Docker container..."
-        $SUDO docker run -v local-ai-data:/models \
-            --gpus all \
-            --restart=always \
-            -e API_KEY=$API_KEY \
-            -e THREADS=$THREADS \
-            $envs \
-            -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND
-    elif [ "$HAS_AMD" ]; then
-        IMAGE_TAG=${LOCALAI_VERSION}-gpu-hipblas
-        # AIO
-        if [ "$USE_AIO" = true ]; then
-            IMAGE_TAG=${LOCALAI_VERSION}-aio-gpu-hipblas
-        fi
-
-        info "Starting LocalAI Docker container..."
-        $SUDO docker run -v local-ai-data:/models \
-            --device /dev/dri \
-            --device /dev/kfd \
-            --group-add=video \
-            --restart=always \
-            -e API_KEY=$API_KEY \
-            -e THREADS=$THREADS \
-            $envs \
-            -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND
-    elif [ "$HAS_INTEL" ]; then
-        IMAGE_TAG=${LOCALAI_VERSION}-gpu-intel
-        # AIO
-        if [ "$USE_AIO" = true ]; then
-            IMAGE_TAG=${LOCALAI_VERSION}-aio-gpu-intel
-        fi
-
-        info "Starting LocalAI Docker container..."
-        $SUDO docker run -v local-ai-data:/models \
-            --device /dev/dri \
-            --restart=always \
-            -e API_KEY=$API_KEY \
-            -e THREADS=$THREADS \
-            $envs \
-            -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND
-
-    else
-        IMAGE_TAG=${LOCALAI_VERSION}
-
-        # AIO
-        if [ "$USE_AIO" = true ]; then
-            IMAGE_TAG=${LOCALAI_VERSION}-aio-cpu
-        fi
-
-        info "Starting LocalAI Docker container..."
-        $SUDO docker run -v local-ai-data:/models \
-                --restart=always \
-                -e MODELS_PATH=/models \
-                -e API_KEY=$API_KEY \
-                -e THREADS=$THREADS \
-                $envs \
-                -d -p $PORT:8080 --name local-ai localai/localai:$IMAGE_TAG $STARTCOMMAND
-    fi
-
-    install_success
-    exit 0
-}
-
-install_binary_darwin() {
-    [ "$(uname -s)" = "Darwin" ] || fatal 'This script is intended to run on macOS only.'
-
-    info "Downloading LocalAI ${LOCALAI_VERSION}..."
-    curl --fail --show-error --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-${LOCALAI_VERSION}-darwin-${ARCH}"
-
-    info "Installing to /usr/local/bin/local-ai"
-    install -o0 -g0 -m755 $TEMP_DIR/local-ai /usr/local/bin/local-ai
-
-    install_success
-}
-
-install_binary() {
-    [ "$(uname -s)" = "Linux" ] || fatal 'This script is intended to run on Linux only.'
-
-
-    IS_WSL2=false
-
-    KERN=$(uname -r)
-    case "$KERN" in
-        *icrosoft*WSL2 | *icrosoft*wsl2) IS_WSL2=true;;
-        *icrosoft) fatal "Microsoft WSL1 is not currently supported. Please upgrade to WSL2 with 'wsl --set-version <distro> 2'" ;;
-        *) ;;
-    esac
-
-
-    NEEDS=$(require curl awk grep sed tee xargs)
-    if [ -n "$NEEDS" ]; then
-        info "ERROR: The following tools are required but missing:"
-        for NEED in $NEEDS; do
-            echo "  - $NEED"
-        done
-        exit 1
-    fi
-
-    info "Downloading LocalAI ${LOCALAI_VERSION}..."
-    curl --fail --location --progress-bar -o $TEMP_DIR/local-ai "https://github.com/mudler/LocalAI/releases/download/${LOCALAI_VERSION}/local-ai-${LOCALAI_VERSION}-linux-${ARCH}"
-
-    for BINDIR in /usr/local/bin /usr/bin /bin; do
-        echo $PATH | grep -q $BINDIR && break || continue
-    done
-
-    info "Installing LocalAI as local-ai to $BINDIR..."
-    $SUDO install -o0 -g0 -m755 -d $BINDIR
-    $SUDO install -o0 -g0 -m755 $TEMP_DIR/local-ai $BINDIR/local-ai
-
-    verify_system
-    if [ "$HAS_SYSTEMD" = true ]; then
-        configure_systemd
-    fi
-
-    # WSL2 only supports GPUs via nvidia passthrough
-    # so check for nvidia-smi to determine if GPU is available
-    if [ "$IS_WSL2" = true ]; then
-        if available nvidia-smi && [ -n "$(nvidia-smi | grep -o "CUDA Version: [0-9]*\.[0-9]*")" ]; then
-            info "Nvidia GPU detected."
-        fi
-        install_success
-        exit 0
-    fi
-
-    # Install GPU dependencies on Linux
-    if ! available lspci && ! available lshw; then
-        warn "Unable to detect NVIDIA/AMD GPU. Install lspci or lshw to automatically detect and install GPU dependencies."
-        exit 0
-    fi
-
-    if [ "$HAS_AMD" = true ]; then
-        install_amd
-    fi
-
-    if [ "$HAS_CUDA" = true ]; then
-        if check_gpu nvidia-smi; then
-            info "NVIDIA GPU installed."
-            exit 0
-        fi
-
-        install_cuda
-    fi
-
-    install_success
-    warn "No NVIDIA/AMD GPU detected. LocalAI will run in CPU-only mode."
-    exit 0
-}
-
-detect_start_command() {
-    STARTCOMMAND="run"
-    if [ "$WORKER" = true ]; then
-        if [ -n "$P2P_TOKEN" ]; then
-            STARTCOMMAND="worker p2p-llama-cpp-rpc"
-        else
-            STARTCOMMAND="worker llama-cpp-rpc"
-        fi
-    elif [ "$FEDERATED" = true ]; then
-        if [ "$FEDERATED_SERVER" = true ]; then
-            STARTCOMMAND="federated"
-        else
-            STARTCOMMAND="$STARTCOMMAND --p2p --federated"
-        fi
-    elif [ -n "$P2P_TOKEN" ]; then
-        STARTCOMMAND="$STARTCOMMAND --p2p"
-    fi
-}
-
-SUDO=
-if [ "$(id -u)" -ne 0 ]; then
-    # Running as root, no need for sudo
-    if ! available sudo; then
-        fatal "This script requires superuser permissions. Please re-run as root."
-    fi
-
-    SUDO="sudo"
-fi
-
-# Check if uninstall flag is provided
-if [ "$1" = "--uninstall" ]; then
-    uninstall_localai
-fi
-
-detect_start_command
-
-OS="$(uname -s)"
-
-ARCH=$(uname -m)
-case "$ARCH" in
-    x86_64) ARCH="amd64" ;;
-    aarch64|arm64) ARCH="arm64" ;;
-    *) fatal "Unsupported architecture: $ARCH" ;;
-esac
-
-if [ "$OS" = "Darwin" ]; then
-    install_binary_darwin
-    exit 0
-fi
-
-if check_gpu lspci amdgpu || check_gpu lshw amdgpu; then
-    HAS_AMD=true
-fi
-
-if check_gpu lspci nvidia || check_gpu lshw nvidia; then
-    HAS_CUDA=true
-fi
-
-if check_gpu lspci intel || check_gpu lshw intel; then
-    HAS_INTEL=true
-fi
-
-PACKAGE_MANAGER=
-for PACKAGE_MANAGER in dnf yum apt-get; do
-    if available $PACKAGE_MANAGER; then
-        break
-    fi
-done
-
-if [ "$DOCKER_INSTALL" = "true" ]; then
-    info "Installing LocalAI from container images"
-    if [ "$HAS_CUDA" = true ]; then
-        install_container_toolkit
-    fi
-    install_docker
-else
-    info "Installing LocalAI from binaries"
-    install_binary
-fi
--- a/go.mod
+++ b/go.mod
@@ -6,10 +6,10 @@ toolchain go1.24.5

 require (
 	dario.cat/mergo v1.0.2
-	fyne.io/fyne/v2 v2.7.2
+	fyne.io/fyne/v2 v2.7.3
 	github.com/Masterminds/sprig/v3 v3.3.0
 	github.com/alecthomas/kong v1.14.0
-	github.com/anthropics/anthropic-sdk-go v1.22.0
+	github.com/anthropics/anthropic-sdk-go v1.26.0
 	github.com/charmbracelet/glamour v0.10.0
 	github.com/containerd/containerd v1.7.30
 	github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
@@ -21,7 +21,7 @@ require (
 	github.com/gofrs/flock v0.13.0
 	github.com/google/go-containerregistry v0.20.7
 	github.com/google/uuid v1.6.0
-	github.com/gpustack/gguf-parser-go v0.23.1
+	github.com/gpustack/gguf-parser-go v0.24.0
 	github.com/hpcloud/tail v1.0.0
 	github.com/ipfs/go-log v1.0.5
 	github.com/jaypipes/ghw v0.23.0
@@ -33,7 +33,7 @@ require (
 	github.com/mholt/archiver/v3 v3.5.1
 	github.com/microcosm-cc/bluemonday v1.0.27
 	github.com/modelcontextprotocol/go-sdk v1.3.0
-	github.com/mudler/cogito v0.9.1-0.20260217143801-bb7f986ed2c7
+	github.com/mudler/cogito v0.9.1
 	github.com/mudler/edgevpn v0.31.1
 	github.com/mudler/go-processmanager v0.1.0
 	github.com/mudler/memory v0.0.0-20251216220809-d1256471a6c2
@@ -101,7 +101,7 @@ require (
 	github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a // indirect
 	github.com/go-task/slim-sprig/v3 v3.0.0 // indirect
 	github.com/go-text/render v0.2.0 // indirect
-	github.com/go-text/typesetting v0.2.1 // indirect
+	github.com/go-text/typesetting v0.3.3 // indirect
 	github.com/godbus/dbus/v5 v5.1.0 // indirect
 	github.com/google/jsonschema-go v0.4.2 // indirect
 	github.com/hack-pad/go-indexeddb v0.3.2 // indirect
--- a/go.sum
+++ b/go.sum
@@ -8,8 +8,8 @@ dmitri.shuralyov.com/app/changes v0.0.0-20180602232624-0a106ad413e3/go.mod h1:Yl
 dmitri.shuralyov.com/html/belt v0.0.0-20180602232347-f7d459c86be0/go.mod h1:JLBrvjyP0v+ecvNYvCpyZgu5/xkfAUhi6wJj28eUfSU=
 dmitri.shuralyov.com/service/change v0.0.0-20181023043359-a85b471d5412/go.mod h1:a1inKt/atXimZ4Mv927x+r7UpyzRUf4emIoiiSC2TN4=
 dmitri.shuralyov.com/state v0.0.0-20180228185332-28bcc343414c/go.mod h1:0PRwlb0D6DFvNNtx+9ybjezNCa8XF0xaYcETyp6rHWU=
-fyne.io/fyne/v2 v2.7.2 h1:XiNpWkn0PzX43ZCjbb0QYGg1RCxVbugwfVgikWZBCMw=
-fyne.io/fyne/v2 v2.7.2/go.mod h1:PXbqY3mQmJV3J1NRUR2VbVgUUx3vgvhuFJxyjRK/4Ug=
+fyne.io/fyne/v2 v2.7.3 h1:xBT/iYbdnNHONWO38fZMBrVBiJG8rV/Jypmy4tVfRWE=
+fyne.io/fyne/v2 v2.7.3/go.mod h1:gu+dlIcZWSzKZmnrY8Fbnj2Hirabv2ek+AKsfQ2bBlw=
 fyne.io/systray v1.12.0 h1:CA1Kk0e2zwFlxtc02L3QFSiIbxJ/P0n582YrZHT7aTM=
 fyne.io/systray v1.12.0/go.mod h1:RVwqP9nYMo7h5zViCBHri2FgjXF7H2cub7MAq4NSoLs=
 git.apache.org/thrift.git v0.0.0-20180902110319-2566ecd5d999/go.mod h1:fPE2ZNJGynbRyZ4dJvy6G277gSllfV2HJqblrnkyeyg=
@@ -44,8 +44,8 @@ github.com/andybalholm/brotli v1.0.1/go.mod h1:loMXtMfwqflxFJPmdbJO0a3KNoPuLBgiu
 github.com/andybalholm/brotli v1.2.0 h1:ukwgCxwYrmACq68yiUqwIWnGY0cTPox/M94sVwToPjQ=
 github.com/andybalholm/brotli v1.2.0/go.mod h1:rzTDkvFWvIrjDXZHkuS16NPggd91W3kUSvPlQ1pLaKY=
 github.com/anmitsu/go-shlex v0.0.0-20161002113705-648efa622239/go.mod h1:2FmKhYUyUczH0OGQWaF5ceTx0UBShxjsH6f8oGKYe2c=
-github.com/anthropics/anthropic-sdk-go v1.22.0 h1:sgo4Ob5pC5InKCi/5Ukn5t9EjPJ7KTMaKm5beOYt6rM=
-github.com/anthropics/anthropic-sdk-go v1.22.0/go.mod h1:WTz31rIUHUHqai2UslPpw5CwXrQP3geYBioRV4WOLvE=
+github.com/anthropics/anthropic-sdk-go v1.26.0 h1:oUTzFaUpAevfuELAP1sjL6CQJ9HHAfT7CoSYSac11PY=
+github.com/anthropics/anthropic-sdk-go v1.26.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
 github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
@@ -128,6 +128,8 @@ github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5Qvfr
 github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E=
 github.com/dlclark/regexp2 v1.11.0 h1:G/nrcoOa7ZXlpoa/91N3X7mM3r8eIlMBBJZvsz/mxKI=
 github.com/dlclark/regexp2 v1.11.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
+github.com/dnaeon/go-vcr v1.2.0 h1:zHCHvJYTMh1N7xnV7zf1m1GPBF9Ad0Jk/whtQ1663qI=
+github.com/dnaeon/go-vcr v1.2.0/go.mod h1:R4UdLID7HZT3taECzJs4YgbbH6PIGXB6W/sc5OLb6RQ=
 github.com/docker/cli v29.0.3+incompatible h1:8J+PZIcF2xLd6h5sHPsp5pvvJA+Sr2wGQxHkRl53a1E=
 github.com/docker/cli v29.0.3+incompatible/go.mod h1:JLrzqnKDaYBop7H2jaqPtU4hHvMKP+vjCwu2uszcLI8=
 github.com/docker/distribution v2.8.3+incompatible h1:AtKxIZ36LoNK51+Z6RpzLpddBirtxJnzDrHLEKxTAYk=
@@ -218,10 +220,10 @@ github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1v
 github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
 github.com/go-text/render v0.2.0 h1:LBYoTmp5jYiJ4NPqDc2pz17MLmA3wHw1dZSVGcOdeAc=
 github.com/go-text/render v0.2.0/go.mod h1:CkiqfukRGKJA5vZZISkjSYrcdtgKQWRa2HIzvwNN5SU=
-github.com/go-text/typesetting v0.2.1 h1:x0jMOGyO3d1qFAPI0j4GSsh7M0Q3Ypjzr4+CEVg82V8=
-github.com/go-text/typesetting v0.2.1/go.mod h1:mTOxEwasOFpAMBjEQDhdWRckoLLeI/+qrQeBCTGEt6M=
-github.com/go-text/typesetting-utils v0.0.0-20241103174707-87a29e9e6066 h1:qCuYC+94v2xrb1PoS4NIDe7DGYtLnU2wWiQe9a1B1c0=
-github.com/go-text/typesetting-utils v0.0.0-20241103174707-87a29e9e6066/go.mod h1:DDxDdQEnB70R8owOx3LVpEFvpMK9eeH1o2r0yZhFI9o=
+github.com/go-text/typesetting v0.3.3 h1:ihGNJU9KzdK2QRDy1Bm7FT5RFQoYb+3n3EIhI/4eaQc=
+github.com/go-text/typesetting v0.3.3/go.mod h1:vIRUT25mLQaSh4C8H/lIsKppQz/Gdb8Pu/tNwpi52ts=
+github.com/go-text/typesetting-utils v0.0.0-20250618110550-c820a94c77b8 h1:4KCscI9qYWMGTuz6BpJtbUSRzcBrUSSE0ENMJbNSrFs=
+github.com/go-text/typesetting-utils v0.0.0-20250618110550-c820a94c77b8/go.mod h1:3/62I4La/HBRX9TcTpBj4eipLiwzf+vhI+7whTc9V7o=
 github.com/go-yaml/yaml v2.1.0+incompatible/go.mod h1:w2MrLa16VYP0jy6N7M5kHaCkaLENm+P+Tv+MfurjSw0=
 github.com/goccy/go-yaml v1.18.0 h1:8W7wMFS12Pcas7KU+VVkaiCng+kG8QiFeFwzFb+rwuw=
 github.com/goccy/go-yaml v1.18.0/go.mod h1:XBurs7gK8ATbW4ZPGKgcbrY1Br56PdM69F7LkFRi1kA=
@@ -294,8 +296,8 @@ github.com/gorilla/css v1.0.1 h1:ntNaBIghp6JmvWnxbZKANoLyuXTPZ4cAMlo6RyhlbO8=
 github.com/gorilla/css v1.0.1/go.mod h1:BvnYkspnSzMmwRK+b8/xgNPLiIuNZr6vbZBTPQ2A3b0=
 github.com/gorilla/websocket v1.5.3 h1:saDtZ6Pbx/0u+bgYQ3q96pZgCzfhKXGPqt7kZ72aNNg=
 github.com/gorilla/websocket v1.5.3/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE=
-github.com/gpustack/gguf-parser-go v0.23.1 h1:0U7DOrsi7ryx2L/dlMy+BSQ5bJV4AuMEIgGBs4RK46A=
-github.com/gpustack/gguf-parser-go v0.23.1/go.mod h1:y4TwTtDqFWTK+xvprOjRUh+dowgU2TKCX37vRKvGiZ0=
+github.com/gpustack/gguf-parser-go v0.24.0 h1:tdJceXYp9e5RhE9RwVYIuUpir72Jz2D68NEtDXkKCKc=
+github.com/gpustack/gguf-parser-go v0.24.0/go.mod h1:y4TwTtDqFWTK+xvprOjRUh+dowgU2TKCX37vRKvGiZ0=
 github.com/gregjones/httpcache v0.0.0-20180305231024-9cad4c3443a7/go.mod h1:FecbI9+v66THATjSRHfNgh1IVFe/9kFxbXtjV0ctIMA=
 github.com/grpc-ecosystem/grpc-gateway v1.5.0/go.mod h1:RSKVYQBd5MCa4OVpNdGskqpgL2+G+NZTnrVHpWWfpdw=
 github.com/grpc-ecosystem/grpc-gateway v1.16.0 h1:gmcG1KaJ57LophUzW0Hy8NmPhnMZb4M0+kPpLofRdBo=
@@ -509,10 +511,8 @@ github.com/morikuni/aec v1.0.0/go.mod h1:BbKIizmSmc5MMPqRYbxO4ZU0S0+P200+tUnFx7P
 github.com/mr-tron/base58 v1.1.2/go.mod h1:BinMc/sQntlIE1frQmRFPUoPA1Zkr8VRgBdjWI2mNwc=
 github.com/mr-tron/base58 v1.2.0 h1:T/HDJBh4ZCPbU39/+c3rRvE0uKBQlU27+QI8LJ4t64o=
 github.com/mr-tron/base58 v1.2.0/go.mod h1:BinMc/sQntlIE1frQmRFPUoPA1Zkr8VRgBdjWI2mNwc=
-github.com/mudler/cogito v0.8.2-0.20260214201734-da0d4ceb2b44 h1:joGszpItINnZdoL/0p2077Wz2xnxMGRSRgYN5mS7I4c=
-github.com/mudler/cogito v0.8.2-0.20260214201734-da0d4ceb2b44/go.mod h1:6sfja3lcu2nWRzEc0wwqGNu/eCG3EWgij+8s7xyUeQ4=
-github.com/mudler/cogito v0.9.1-0.20260217143801-bb7f986ed2c7 h1:z3AcM7LbaQb+C955JdSXksHB9B0uWGQpdgl05gJM+9Y=
-github.com/mudler/cogito v0.9.1-0.20260217143801-bb7f986ed2c7/go.mod h1:6sfja3lcu2nWRzEc0wwqGNu/eCG3EWgij+8s7xyUeQ4=
+github.com/mudler/cogito v0.9.1 h1:6y7VPHSS+Q+v4slV42XcjykN5wip4N7C/rXTwWPBVFM=
+github.com/mudler/cogito v0.9.1/go.mod h1:6sfja3lcu2nWRzEc0wwqGNu/eCG3EWgij+8s7xyUeQ4=
 github.com/mudler/edgevpn v0.31.1 h1:7qegiDWd0kAg6ljhNHxqvp8hbo/6BbzSdbb7/2WZfiY=
 github.com/mudler/edgevpn v0.31.1/go.mod h1:ftV5B0nKFzm4R8vR80UYnCb2nf7lxCRgAALxUEEgCf8=
 github.com/mudler/go-piper v0.0.0-20241023091659-2494246fd9fc h1:RxwneJl1VgvikiX28EkpdAyL4yQVnJMrbquKospjHyA=
--- a/pkg/utils/ffmpeg.go
+++ b/pkg/utils/ffmpeg.go
@@ -42,6 +42,21 @@ func AudioToWav(src, dst string) error {
 	return nil
 }

+// AudioResample resamples an audio file to the given sample rate using ffmpeg.
+// If sampleRate <= 0, it is a no-op and returns src unchanged.
+func AudioResample(src string, sampleRate int) (string, error) {
+	if sampleRate <= 0 {
+		return src, nil
+	}
+	dst := strings.Replace(src, ".wav", fmt.Sprintf("_%dhz.wav", sampleRate), 1)
+	commandArgs := []string{"-y", "-i", src, "-ar", fmt.Sprintf("%d", sampleRate), dst}
+	out, err := ffmpegCommand(commandArgs)
+	if err != nil {
+		return "", fmt.Errorf("error resampling audio: %w out: %s", err, out)
+	}
+	return dst, nil
+}
+
 // AudioConvert converts generated wav file from tts to other output formats.
 // TODO: handle pcm to have 100% parity of supported format from OpenAI
 func AudioConvert(src string, format string) (string, error) {
--- a/swagger/docs.go
+++ b/swagger/docs.go
@@ -3526,6 +3526,10 @@ const docTemplate = `{
                    "description": "(optional) output format",
                    "type": "string"
                },
+                "sample_rate": {
+                    "description": "(optional) desired output sample rate",
+                    "type": "integer"
+                },
                "stream": {
                    "description": "(optional) enable streaming TTS",
                    "type": "boolean"
--- a/swagger/swagger.json
+++ b/swagger/swagger.json
@@ -3519,6 +3519,10 @@
                    "description": "(optional) output format",
                    "type": "string"
                },
+                "sample_rate": {
+                    "description": "(optional) desired output sample rate",
+                    "type": "integer"
+                },
                "stream": {
                    "description": "(optional) enable streaming TTS",
                    "type": "boolean"
--- a/swagger/swagger.yaml
+++ b/swagger/swagger.yaml
@@ -1362,6 +1362,9 @@ definitions:
      response_format:
        description: (optional) output format
        type: string
+      sample_rate:
+        description: (optional) desired output sample rate
+        type: integer
      stream:
        description: (optional) enable streaming TTS
        type: boolean
Author	SHA1	Message	Date
Ettore Di Giacinto	e169492543	fix: this backend is CUDA only Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-26 23:08:24 +00:00
Ettore Di Giacinto	51c26f1f39	feat(backends): add faster-qwen3-tts Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-26 23:00:59 +00:00
LocalAI [bot]	65082b3a6f	fix: Add named volumes for Windows Docker compatibility (#8661 ) - Added named volumes (models, images) to docker-compose.yaml - Added named volumes (models, backends) to .devcontainer/docker-compose-devcontainer.yml - Changed bind mounts to named volumes for Windows compatibility Fixes #8455 Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-26 23:18:53 +01:00
Ettore Di Giacinto	0483d47674	Change condition for dependabot job in workflow Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-02-26 23:17:33 +01:00
LocalAI [bot]	8ad40091a6	chore: ⬆️ Update ggml-org/llama.cpp to `723c71064da0908c19683f8c344715fbf6d986fd` (#8660 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-26 21:34:47 +00:00
LocalAI [bot]	8bfe458fbc	fix: change file permissions from 0600 to 0644 in InstallModel (#8657 ) Closes #8119 When installing models from the gallery, files are created with 0600 permissions (owner read/write only), making them unreadable by the LocalAI server when running as a different user. This fix changes the permissions to 0644 (owner read/write, group/others read), allowing the server to read model files regardless of the user it runs as. Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-26 09:38:54 +01:00
Ettore Di Giacinto	657ba8cdad	fix(chat): do not send thinking/reasoning messages to the LLM (#8656 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-26 00:06:35 +01:00
LocalAI [bot]	fb86f6461d	chore: ⬆️ Update ggml-org/llama.cpp to `3769fe6eb70b0a0fbb30b80917f1caae68c902f7` (#8655 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-26 00:05:03 +01:00
LocalAI [bot]	1027c487a6	fix: reload model after editing YAML config (issue #8647 ) (#8652 ) fix: reload model configuration after editing (issue #8647) - Add *model.ModelLoader parameter to EditModelEndpoint - Call ml.ShutdownModel() after saving config to unload the running model - Model will be reloaded on next inference request with new settings (e.g., context_size) - Update route registration to pass ml to EditModelEndpoint Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-25 22:18:42 +01:00
LocalAI [bot]	bb226d1eaa	feat(swagger): update swagger (#8654 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-25 21:52:35 +01:00
Ettore Di Giacinto	b032cf489b	fix(chatterbox): add support for cuda13/aarch64 (#8653 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-02-25 21:51:44 +01:00
Copilot	3ac7301f31	Add `sample_rate` support to TTS API via post-processing resampling (#8650 ) * Initial plan * Add TTS sample_rate support via AudioResample post-processing Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-25 16:36:27 +01:00
dependabot[bot]	c4783a0a05	chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm (#8635 ) Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.78.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:17:32 +01:00
dependabot[bot]	c44f03b882	chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers (#8636 ) chore(deps): bump grpcio in /backend/python/rerankers Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.78.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:16:57 +01:00
dependabot[bot]	eeec92af78	chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers (#8638 ) chore(deps): bump sentence-transformers in /backend/python/transformers Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.2 to 5.2.3. - [Release notes](https://github.com/huggingface/sentence-transformers/releases) - [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.2...v5.2.3) --- updated-dependencies: - dependency-name: sentence-transformers dependency-version: 5.2.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:16:41 +01:00
dependabot[bot]	842033b8b5	chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers (#8640 ) chore(deps): bump grpcio in /backend/python/transformers Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.78.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:14:55 +01:00
dependabot[bot]	a2941228a7	chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template (#8641 ) chore(deps): bump grpcio in /backend/python/common/template Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.78.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:14:43 +01:00
dependabot[bot]	791e6b84ee	chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui (#8642 ) Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1. - [Release notes](https://github.com/grpc/grpc/releases) - [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1) --- updated-dependencies: - dependency-name: grpcio dependency-version: 1.78.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-25 08:14:30 +01:00
LocalAI [bot]	d845c39963	docs: add Podman installation documentation (#8646 ) * docs: add Podman installation documentation - Add new podman.md with comprehensive installation and usage guide - Cover installation on multiple platforms (Ubuntu, Fedora, Arch, macOS, Windows) - Document GPU support (NVIDIA CUDA, AMD ROCm, Intel, Vulkan) - Include rootless container configuration - Document Docker Compose with podman-compose - Add troubleshooting section for common issues - Link to Podman documentation in installation index - Update image references to use Docker Hub and link to docker docs - Change YAML heredoc to EOF in compose.yaml example - Add curly brackets to notice shortcode and fix link Closes #8645 Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> * docs: merge Docker and Podman docs into unified Containers guide Following the review comment, we have merged the Docker and Podman documentation into a single 'Containers' page that covers both container engines. The Docker and Podman pages now redirect to this unified guide. Changes: - Added new docs/content/installation/containers.md with combined Docker/Podman guide - Updated docs/content/installation/docker.md to redirect to containers - Updated docs/content/installation/podman.md to redirect to containers - Updated docs/content/installation/_index.en.md to link to containers Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com> Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> * docs: remove podman.md as docs are merged into containers.md Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> --------- Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> Signed-off-by: LocalAI [bot] <localai-bot@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-25 08:13:55 +01:00
LocalAI [bot]	1331e23b67	chore: ⬆️ Update ggml-org/llama.cpp to `418dea39cea85d3496c8b04a118c3b17f3940ad8` (#8649 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-25 00:04:48 +00:00
LocalAI [bot]	36ff2a0138	fix(webui): use different icon for System nav item (#8648 ) Change the System nav item icon from fas fa-server to fas fa-desktop to distinguish it from the Backends nav item which still uses fa-server. Signed-off-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>	2026-02-24 17:10:58 +01:00
LocalAI [bot]	db6ba4ef07	chore: remove install.sh script and documentation references (#8643 ) * chore: remove install.sh script and documentation references - Delete docs/static/install.sh (broken installer causing issues) - Remove One-Line Installer section from linux.md documentation - Remove install.sh references from installation/_index.en.md - Remove install.sh warning and commands from README.md Closes #8032 * fix: add missing closing braces to notice shortcode	2026-02-24 08:36:25 +01:00
dependabot[bot]	d19dcac863	chore(deps): bump github.com/mudler/cogito from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1 (#8632 ) chore(deps): bump github.com/mudler/cogito Bumps [github.com/mudler/cogito](https://github.com/mudler/cogito) from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1. - [Release notes](https://github.com/mudler/cogito/releases) - [Commits](https://github.com/mudler/cogito/commits/v0.9.1) --- updated-dependencies: - dependency-name: github.com/mudler/cogito dependency-version: 0.9.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-24 01:29:47 +00:00
dependabot[bot]	fd42675bec	chore(deps): bump goreleaser/goreleaser-action from 6 to 7 (#8634 ) Bumps [goreleaser/goreleaser-action](https://github.com/goreleaser/goreleaser-action) from 6 to 7. - [Release notes](https://github.com/goreleaser/goreleaser-action/releases) - [Commits](https://github.com/goreleaser/goreleaser-action/compare/v6...v7) --- updated-dependencies: - dependency-name: goreleaser/goreleaser-action dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 23:27:49 +01:00
dependabot[bot]	3391538806	chore(deps): bump actions/stale from 10.1.1 to 10.2.0 (#8633 ) Bumps [actions/stale](https://github.com/actions/stale) from 10.1.1 to 10.2.0. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](`997185467f...b5d41d4e1d`) --- updated-dependencies: - dependency-name: actions/stale dependency-version: 10.2.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 23:27:20 +01:00
dependabot[bot]	c4f879c4ea	chore(deps): bump github.com/gpustack/gguf-parser-go from 0.23.1 to 0.24.0 (#8631 ) chore(deps): bump github.com/gpustack/gguf-parser-go Bumps [github.com/gpustack/gguf-parser-go](https://github.com/gpustack/gguf-parser-go) from 0.23.1 to 0.24.0. - [Release notes](https://github.com/gpustack/gguf-parser-go/releases) - [Commits](https://github.com/gpustack/gguf-parser-go/compare/v0.23.1...v0.24.0) --- updated-dependencies: - dependency-name: github.com/gpustack/gguf-parser-go dependency-version: 0.24.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 23:26:41 +01:00
dependabot[bot]	b7e0de54fe	chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.22.0 to 1.26.0 (#8630 ) chore(deps): bump github.com/anthropics/anthropic-sdk-go Bumps [github.com/anthropics/anthropic-sdk-go](https://github.com/anthropics/anthropic-sdk-go) from 1.22.0 to 1.26.0. - [Release notes](https://github.com/anthropics/anthropic-sdk-go/releases) - [Changelog](https://github.com/anthropics/anthropic-sdk-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/anthropics/anthropic-sdk-go/compare/v1.22.0...v1.26.0) --- updated-dependencies: - dependency-name: github.com/anthropics/anthropic-sdk-go dependency-version: 1.26.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 23:26:15 +01:00
dependabot[bot]	f0868acdf3	chore(deps): bump fyne.io/fyne/v2 from 2.7.2 to 2.7.3 (#8629 ) Bumps [fyne.io/fyne/v2](https://github.com/fyne-io/fyne) from 2.7.2 to 2.7.3. - [Release notes](https://github.com/fyne-io/fyne/releases) - [Changelog](https://github.com/fyne-io/fyne/blob/v2.7.3/CHANGELOG.md) - [Commits](https://github.com/fyne-io/fyne/compare/v2.7.2...v2.7.3) --- updated-dependencies: - dependency-name: fyne.io/fyne/v2 dependency-version: 2.7.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-02-23 23:25:46 +01:00
LocalAI [bot]	9a5b5ee8a9	chore: ⬆️ Update ggml-org/llama.cpp to `b68a83e641b3ebe6465970b34e99f3f0e0a0b21a` (#8628 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-23 22:02:40 +00:00
Lukas Schaefer	ed0bfb8732	fix: rename json_verbose to verbose_json (#8627 ) Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>	2026-02-23 17:57:06 +00:00
Richard Palethorpe	be84b1d258	feat(traces): Use accordian instead of pop-ups (#8626 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-02-23 13:07:41 +01:00
Andres	cbedcc9091	fix(api): Downgrade health/readiness check to debug (#8625 ) Downgrade health/readiness check to debug Signed-off-by: Andres Smith <andressmithdev@pm.me>	2026-02-23 11:58:04 +01:00
Andres	e45d63c86e	fix(cli): Fix watchdog running constantly and spamming logs (#8624 ) * Fix watchdog running constantly and spamming logs Signed-off-by: Andres Smith <andressmithdev@pm.me> * Update docs Signed-off-by: Andres Smith <andressmithdev@pm.me> --------- Signed-off-by: Andres Smith <andressmithdev@pm.me>	2026-02-23 11:57:28 +01:00
LocalAI [bot]	f40c8dd0ce	chore: ⬆️ Update ggml-org/llama.cpp to `2b6dfe824de8600c061ef91ce5cc5c307f97112c` (#8622 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-23 09:30:58 +00:00
LocalAI [bot]	559ab99890	docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration (#8621 ) * docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration * chore: revert backend/python/diffusers/README.md to original content --------- Co-authored-by: Your Name <you@example.com>	2026-02-22 18:17:23 +01:00
LocalAI [bot]	91f2dd5820	chore: ⬆️ Update ggml-org/llama.cpp to `f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5` (#8619 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-22 13:20:08 +01:00
LocalAI [bot]	8250815763	docs: ⬆️ update docs version mudler/LocalAI (#8618 ) ⬆️ Update docs version mudler/LocalAI Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-02-21 21:18:40 +00:00
Richard Palethorpe	b1b67b973e	fix(realtime): Add functions to conversation history (#8616 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-02-21 19:03:49 +01:00