fix(ci): switch apt mirror per runner — azure on github-hosted, kernel.org on self-hosted

Self-hosted runners (arc-runner-set, bigger-runner) cannot reach azure.archive.ubuntu.com — they live in different networks (e.g. our arc-runner-set Kubernetes cluster) where Azure's mirror IP is not routable. Symptom: "Connection failed [IP: 51.11.236.225 80]" with each Ign:/Err: cycle taking 60s, hanging the build for ~16 minutes before exit 100. Pick the mirror based on `runner.environment`: * github-hosted (ubuntu-latest, ubuntu-24.04-arm) → Azure (http://azure.archive.ubuntu.com / http://azure.ports.ubuntu.com) — same VPC as the runner. * self-hosted (arc-runner-set, bigger-runner) → kernel.org (https://mirrors.edge.kernel.org for both archive and ports) — publicly reachable from any network. The choice now lives in one place: the .github/actions/configure-apt-mirror composite action exposes `effective-mirror` / `effective-ports-mirror` outputs so the reusable workflows can forward the same value as Docker build-args without duplicating the per-runner-environment branch. The now-redundant `apt-mirror` / `apt-ports-mirror` workflow inputs on image_build.yml and backend_build.yml are dropped — defaults live in the composite action and are visible there. Assisted-by: Claude:claude-opus-4-7[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650 )
2026-07-03 21:07:33 -04:00 · 2026-05-03 22:59:26 +00:00 · 2026-05-03 23:50:13 +02:00 · 2026-05-03 09:06:31 +02:00 · 2026-05-02 23:37:04 +00:00 · 2026-05-02 22:42:08 +02:00
2082 changed files with 564572 additions and 57198 deletions
--- a/.agents/adding-backends.md
+++ b/.agents/adding-backends.md
@@ -0,0 +1,215 @@
+# Adding a New Backend
+
+When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
+
+## 1. Create Backend Directory Structure
+
+Create the backend directory under the appropriate location:
+- **Python backends**: `backend/python/<backend-name>/`
+- **Go backends**: `backend/go/<backend-name>/`
+- **C++ backends**: `backend/cpp/<backend-name>/`
+- **Rust backends**: `backend/rust/<backend-name>/`
+
+For Python backends, you'll typically need:
+- `backend.py` - Main gRPC server implementation
+- `Makefile` - Build configuration
+- `install.sh` - Installation script for dependencies
+- `protogen.sh` - Protocol buffer generation script
+- `requirements.txt` - Python dependencies
+- `run.sh` - Runtime script
+- `test.py` / `test.sh` - Test files
+
+For Rust backends, you'll typically need (see `backend/rust/kokoros/` as a reference):
+- `Cargo.toml` - Crate manifest; depend on the upstream project as a submodule under `sources/`
+- `build.rs` - Invokes `tonic_build` to generate gRPC stubs from `backend/backend.proto` (use the `BACKEND_PROTO_PATH` env var so the Makefile can inject the canonical copy)
+- `src/` - The gRPC server implementation (implement `Backend` via `tonic`)
+- `Makefile` - Copies `backend.proto` into the crate, runs `cargo build --release`, then `package.sh`
+- `package.sh` - Uses `ldd` to bundle the binary's dynamic deps and `ld.so` into `package/lib/`
+- `run.sh` - Sets `LD_LIBRARY_PATH`/`SSL_CERT_DIR` and execs the binary via the bundled `lib/ld.so`
+- `sources/<UpstreamProject>/` - Git submodule with the upstream Rust crate
+
+## 2. Add Build Configurations to `.github/workflows/backend.yml`
+
+Add build matrix entries for each platform/GPU type you want to support. Look at similar backends for reference — `chatterbox`/`faster-whisper` for Python, `piper`/`silero-vad` for Go, `kokoros` for Rust.
+
+**Without an entry here no image is ever built or pushed, and the gallery entry in `backend/index.yaml` will point at a tag that does not exist.** The `dockerfile:` field must point at `./backend/Dockerfile.<lang>` matching the language bucket from step 1 (e.g. `Dockerfile.python`, `Dockerfile.golang`, `Dockerfile.rust`). The `tag-suffix` must match the `uri:` in the corresponding `backend/index.yaml` image entry exactly.
+
+If you add a new language bucket, `scripts/changed-backends.js` also needs a branch in `inferBackendPath` so PR change-detection routes file edits correctly.
+
+**Placement in file:**
+- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
+- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
+- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
+
+**Additional build types you may need:**
+- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
+- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
+- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
+
+## 3. Add Backend Metadata to `backend/index.yaml`
+
+**Step 3a: Add Meta Definition**
+
+Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
+
+**Step 3b: Add Image Entries**
+
+Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
+
+## 4. Update the Makefile
+
+The Makefile needs to be updated in several places to support building and testing the new backend:
+
+**Step 4a: Add to `.NOTPARALLEL`**
+
+Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
+
+```makefile
+.NOTPARALLEL: ... backends/<backend-name>
+```
+
+**Step 4b: Add to `prepare-test-extra`**
+
+Add the backend to the `prepare-test-extra` target to prepare it for testing. Use the path matching your language bucket (`backend/python/`, `backend/go/`, `backend/rust/`, …):
+
+```makefile
+prepare-test-extra: protogen-python
+	...
+	$(MAKE) -C backend/<lang>/<backend-name>
+```
+
+For Rust backends the target is usually the crate build target itself (e.g. `$(MAKE) -C backend/rust/<backend-name> <backend-name>-grpc`) so the binary is in place before `test` runs.
+
+**Step 4c: Add to `test-extra`**
+
+Add the backend to the `test-extra` target to run its tests — applies to Go and Rust backends too, not only Python:
+
+```makefile
+test-extra: prepare-test-extra
+	...
+	$(MAKE) -C backend/<lang>/<backend-name> test
+```
+
+Each backend's own `Makefile` should define a `test` target so this line works regardless of language. Integration tests that need large model downloads should be gated behind an env var (see `backend/rust/kokoros/`'s `KOKOROS_MODEL_PATH` pattern) so CI only runs unit tests.
+
+**Step 4d: Add Backend Definition**
+
+Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
+
+**For Python backends with root context** (like `faster-whisper`, `coqui`):
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
+```
+
+**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
+```
+
+**For Go backends**:
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
+```
+
+**For Rust backends**:
+```makefile
+BACKEND_<BACKEND_NAME> = <backend-name>|rust|.|false|true
+```
+
+The language field (`python`/`golang`/`rust`/…) must match a `backend/Dockerfile.<lang>` file.
+
+**Step 4e: Generate Docker Build Target**
+
+Add an eval call to generate the docker-build target (around line 480-501):
+
+```makefile
+$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
+```
+
+**Step 4f: Add to `docker-build-backends`**
+
+Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
+
+```makefile
+docker-build-backends: ... docker-build-<backend-name>
+```
+
+**Determining the Context:**
+
+- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
+- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
+- Check similar backends to determine the correct context
+
+## 5. Verification Checklist
+
+After adding a new backend, verify:
+
+- [ ] Backend directory structure is complete with all necessary files
+- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
+- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
+- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
+- [ ] Tag suffixes match between workflow file and index.yaml
+- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
+- [ ] No YAML syntax errors (check with linter)
+- [ ] No Makefile syntax errors (check with linter)
+- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
+
+## Bundling runtime shared libraries (`package.sh`)
+
+The final `Dockerfile.python` stage is `FROM scratch` — there is no system `libc`, no `apt`, no fallback library path. Only files explicitly copied from the builder stage end up in the backend image. That means any runtime `dlopen` your backend (or its Python deps) needs **must** be packaged into `${BACKEND}/lib/`.
+
+Pattern:
+
+1. Make sure the library is installed in the builder stage of `backend/Dockerfile.python` (add it to the top-level `apt-get install`).
+2. Drop a `package.sh` in your backend directory that copies the library — and its soname symlinks — into `$(dirname $0)/lib`. See `backend/python/vllm/package.sh` for a reference implementation that walks `/usr/lib/x86_64-linux-gnu`, `/usr/lib/aarch64-linux-gnu`, etc.
+3. `Dockerfile.python` already runs `package.sh` automatically if it exists, after `package-gpu-libs.sh`.
+4. `libbackend.sh` automatically prepends `${EDIR}/lib` to `LD_LIBRARY_PATH` at run time, so anything packaged this way is found by `dlopen`.
+
+How to find missing libs: when a Python module silently fails to register torch ops or you see `AttributeError: '_OpNamespace' '...' object has no attribute '...'`, run the backend image's Python with `LD_DEBUG=libs` to see which `dlopen` failed. The filename in the error message (e.g. `libnuma.so.1`) is what you need to package.
+
+To verify packaging works without trusting the host:
+
+```bash
+make docker-build-<backend>
+CID=$(docker create --entrypoint=/run.sh local-ai-backend:<backend>)
+docker cp $CID:/lib /tmp/check && docker rm $CID
+ls /tmp/check    # expect the bundled .so files + symlinks
+```
+
+Then boot it inside a fresh `ubuntu:24.04` (which intentionally does *not* have the lib installed) to confirm it actually loads from the backend dir.
+
+## Importer integration
+
+When you add a new backend, you MUST also make it importable via the model import form (`/import-model`). The import form dropdown is sourced dynamically from `GET /backends/known` — it reads the importer registry at `core/gallery/importers/importers.go`, so the steps below are the ONLY way to make your backend show up.
+
+Required steps:
+
+1. **If your backend has unambiguous detection signals** (unique file extension, HF `pipeline_tag`, unique repo name pattern, unique artefact like `modules.json`):
+   - Create an importer file at `core/gallery/importers/<backend>.go` following the Match/Import pattern in `llama-cpp.go`.
+   - Register it in `importers.go:defaultImporters` in **specificity order** — more specific detectors must appear BEFORE more generic ones (e.g. `sentencetransformers` before `transformers`, `stablediffusion-ggml` before `llama-cpp`, `vllm-omni` before `vllm`). First match wins.
+2. **If your backend is a drop-in replacement** (same artefacts as another backend, e.g. `ik-llama-cpp` and `turboquant` both consume GGUF the same way `llama-cpp` does):
+   - Do NOT create a new importer. Extend the existing importer's `Import()` to swap the emitted `backend:` field when `preferences.backend` matches. See `llama-cpp.go` for the pattern.
+3. **If your backend has no reliable auto-detect signal** (preference-only — e.g. `sglang`, `tinygrad`, `whisperx`):
+   - Do NOT create an importer. Instead add the backend name to the curated pref-only slice in `core/http/endpoints/localai/backend.go` that feeds `/backends/known`. A single line addition.
+4. **Always** add a table-driven test in `core/gallery/importers/importers_test.go` (Ginkgo/Gomega):
+   - Use a real public HuggingFace repo URI as the test fixture (existing tests already hit the live HF API — follow that pattern).
+   - Cover detection (auto-match without preferences), preference-override (explicit `backend:` in preferences wins), and — if the backend's modality has a common `pipeline_tag` but ambiguous artefacts — an ambiguity test asserting `errors.Is(err, importers.ErrAmbiguousImport)`.
+
+Rules of thumb:
+
+- When in doubt, lean pref-only. A wrong auto-detect is worse than a forced preference.
+- Never silently emit a modality mismatch (e.g. emit `llama-cpp` for a TTS repo because `.gguf` is present). Return `ErrAmbiguousImport` instead.
+- Registration order is the single most common source of bugs. Check by running `go test ./core/gallery/importers/...` — the existing suite will fail if you've shadowed a pre-existing detector.
+
+## 6. Example: Adding a Python Backend
+
+For reference, when `moonshine` was added:
+- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
+- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
+- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
+- **Makefile updates**:
+  - Added to `.NOTPARALLEL` line
+  - Added to `prepare-test-extra` and `test-extra` targets
+  - Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
+  - Added eval for docker-build target generation
+  - Added `docker-build-moonshine` to `docker-build-backends`
--- a/.agents/adding-gallery-models.md
+++ b/.agents/adding-gallery-models.md
@@ -0,0 +1,111 @@
+# Adding GGUF Models from HuggingFace to the Gallery
+
+When adding a GGUF model from HuggingFace to the LocalAI model gallery, follow this guide.
+
+## Gallery file
+
+All models are defined in `gallery/index.yaml`. Find the appropriate section (embedding models near other embeddings, chat models near similar chat models) and add a new entry.
+
+## Getting the SHA256
+
+GGUF files on HuggingFace expose their SHA256 via the `x-linked-etag` HTTP header. Fetch it with:
+
+```bash
+curl -sI "https://huggingface.co/<org>/<repo>/resolve/main/<filename>.gguf" | grep -i x-linked-etag
+```
+
+The value (without quotes) is the SHA256 hash. Example:
+
+```bash
+curl -sI "https://huggingface.co/ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/resolve/main/embeddinggemma-300m-qat-Q8_0.gguf" | grep -i x-linked-etag
+# x-linked-etag: "6fa0c02a9c302be6f977521d399b4de3a46310a4f2621ee0063747881b673f67"
+```
+
+**Important**: Pay attention to exact filename casing — HuggingFace filenames are case-sensitive (e.g., `Q8_0` vs `q8_0`). Check the repo's file listing to get the exact name.
+
+## Entry format — Embedding models
+
+Embedding models use `gallery/virtual.yaml` as the base config and set `embeddings: true`:
+
+```yaml
+- name: "model-name"
+  url: github:mudler/LocalAI/gallery/virtual.yaml@master
+  urls:
+    - https://huggingface.co/<original-model-org>/<original-model-name>
+    - https://huggingface.co/<gguf-org>/<gguf-repo-name>
+  description: |
+    Short description of the model, its size, and capabilities.
+  tags:
+    - embeddings
+  overrides:
+    backend: llama-cpp
+    embeddings: true
+    parameters:
+      model: <filename>.gguf
+  files:
+    - filename: <filename>.gguf
+      uri: huggingface://<gguf-org>/<gguf-repo-name>/<filename>.gguf
+      sha256: <sha256-hash>
+```
+
+## Entry format — Chat/LLM models
+
+Chat models typically reference a template config (e.g., `gallery/gemma.yaml`, `gallery/chatml.yaml`) that defines the prompt format. Use YAML anchors (`&name` / `*name`) if adding multiple quantization variants of the same model:
+
+```yaml
+- &model-anchor
+  url: "github:mudler/LocalAI/gallery/<template>.yaml@master"
+  name: "model-name"
+  icon: https://example.com/icon.png
+  license: <license>
+  urls:
+    - https://huggingface.co/<org>/<model>
+    - https://huggingface.co/<gguf-org>/<gguf-repo>
+  description: |
+    Model description.
+  tags:
+    - llm
+    - gguf
+    - gpu
+    - cpu
+  overrides:
+    parameters:
+      model: <filename>-Q4_K_M.gguf
+  files:
+    - filename: <filename>-Q4_K_M.gguf
+      sha256: <sha256>
+      uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q4_K_M.gguf
+```
+
+To add a variant (e.g., different quantization), use YAML merge:
+
+```yaml
+- !!merge <<: *model-anchor
+  name: "model-name-q8"
+  overrides:
+    parameters:
+      model: <filename>-Q8_0.gguf
+  files:
+    - filename: <filename>-Q8_0.gguf
+      sha256: <sha256>
+      uri: huggingface://<gguf-org>/<gguf-repo>/<filename>-Q8_0.gguf
+```
+
+## Available template configs
+
+Look at existing `.yaml` files in `gallery/` to find the right prompt template for your model architecture:
+
+- `gemma.yaml` — Gemma-family models (gemma, embeddinggemma, etc.)
+- `chatml.yaml` — ChatML format (many Mistral/OpenHermes models)
+- `deepseek.yaml` — DeepSeek models
+- `virtual.yaml` — Minimal base (good for embedding models that don't need chat templates)
+
+## Checklist
+
+1. **Find the GGUF file** on HuggingFace — note exact filename (case-sensitive)
+2. **Get the SHA256** using the `curl -sI` + `x-linked-etag` method above
+3. **Choose the right template** config from `gallery/` based on model architecture
+4. **Add the entry** to `gallery/index.yaml` near similar models
+5. **Set `embeddings: true`** if it's an embedding model
+6. **Include both URLs** — the original model page and the GGUF repo
+7. **Write a description** — mention model size, capabilities, and quantization type
--- a/.agents/ai-coding-assistants.md
+++ b/.agents/ai-coding-assistants.md
@@ -0,0 +1,101 @@
+# AI Coding Assistants
+
+This document provides guidance for AI tools and developers using AI
+assistance when contributing to LocalAI.
+
+**LocalAI follows the same guidelines as the Linux kernel project for
+AI-assisted contributions.** See the upstream policy here:
+<https://docs.kernel.org/process/coding-assistants.html>
+
+The rules below mirror that policy, adapted to LocalAI's license and
+project layout. If anything is unclear, the kernel document is the
+authoritative reference for intent.
+
+AI tools helping with LocalAI development should follow the standard
+project development process:
+
+- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
+  conventions, and PR guidelines
+- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
+  logging, and documentation conventions
+- [.agents/building-and-testing.md](building-and-testing.md) — build and
+  test procedures
+
+## Licensing and Legal Requirements
+
+All contributions must comply with LocalAI's licensing requirements:
+
+- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
+  file
+- New source files should use the SPDX license identifier `MIT` where
+  applicable to the file type
+- Contributions must be compatible with the MIT License and must not
+  introduce code under incompatible licenses (e.g., GPL) without an
+  explicit discussion with maintainers
+
+## Signed-off-by and Developer Certificate of Origin
+
+**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
+certify the Developer Certificate of Origin (DCO). The human submitter
+is responsible for:
+
+- Reviewing all AI-generated code
+- Ensuring compliance with licensing requirements
+- Adding their own `Signed-off-by` tag (when the project requires DCO)
+  to certify the contribution
+- Taking full responsibility for the contribution
+
+AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
+A human reviewer owns the contribution; the AI's involvement is recorded
+via `Assisted-by` (see below).
+
+## Attribution
+
+When AI tools contribute to LocalAI development, proper attribution helps
+track the evolving role of AI in the development process. Contributions
+should include an `Assisted-by` tag in the commit message trailer in the
+following format:
+
+```
+Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
+```
+
+Where:
+
+- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
+  `Copilot`, `Cursor`)
+- `MODEL_VERSION` — specific model version used (e.g.,
+  `claude-opus-4-7`, `gpt-5`)
+- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
+  agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
+
+Basic development tools (git, go, make, editors) should **not** be listed.
+
+### Example
+
+```
+fix(llama-cpp): handle empty tool call arguments
+
+Previously the parser panicked when the model returned a tool call with
+an empty arguments object. Fall back to an empty JSON object in that
+case so downstream consumers receive a valid payload.
+
+Assisted-by: Claude:claude-opus-4-7 golangci-lint
+Signed-off-by: Jane Developer <jane@example.com>
+```
+
+## Scope and Responsibility
+
+Using an AI assistant does not reduce the contributor's responsibility.
+The human submitter must:
+
+- Understand every line that lands in the PR
+- Verify that generated code compiles, passes tests, and follows the
+  project style
+- Confirm that any referenced APIs, flags, or file paths actually exist
+  in the current tree (AI models may hallucinate identifiers)
+- Not submit AI output verbatim without review
+
+Reviewers may ask for clarification on any change regardless of how it
+was produced. "An AI wrote it" is not an acceptable answer to a design
+question.
--- a/.agents/api-endpoints-and-auth.md
+++ b/.agents/api-endpoints-and-auth.md
@@ -0,0 +1,345 @@
+# API Endpoints and Authentication
+
+This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
+
+> **Before you ship a new endpoint or capability surface**, re-read the [checklist at the bottom of this file](#checklist). LocalAI advertises its feature surface in several independent places — miss any one of them and clients/admins/UI won't know the endpoint exists.
+
+## Architecture overview
+
+Authentication and authorization flow through three layers:
+
+1. **Global auth middleware** (`core/http/auth/middleware.go` → `auth.Middleware`) — applied to every request in `core/http/app.go`. Handles session cookies, Bearer tokens, API keys, and legacy API keys. Populates `auth_user` and `auth_role` in the Echo context.
+2. **Feature middleware** (`auth.RequireFeature`) — per-feature access control applied to route groups or individual routes. Checks if the authenticated user has the specific feature enabled.
+3. **Admin middleware** (`auth.RequireAdmin`) — restricts endpoints to admin users only.
+
+When auth is disabled (no auth DB, no legacy API keys), all middleware becomes pass-through (`auth.NoopMiddleware`).
+
+## Adding a new API endpoint
+
+### Step 1: Create the handler
+
+Write the endpoint handler in the appropriate package under `core/http/endpoints/`. Follow existing patterns:
+
+```go
+// core/http/endpoints/localai/my_feature.go
+func MyFeatureEndpoint(app *application.Application) echo.HandlerFunc {
+    return func(c echo.Context) error {
+        // Use auth.GetUser(c) to get the authenticated user (may be nil if auth is disabled)
+        user := auth.GetUser(c)
+
+        // Your logic here
+        return c.JSON(http.StatusOK, result)
+    }
+}
+```
+
+### Step 2: Register routes
+
+Add routes in the appropriate file under `core/http/routes/`. The file you use depends on the endpoint category:
+
+| File | Category |
+|------|----------|
+| `routes/openai.go` | OpenAI-compatible API endpoints (`/v1/...`) |
+| `routes/localai.go` | LocalAI-specific endpoints (`/api/...`, `/models/...`, `/backends/...`) |
+| `routes/agents.go` | Agent pool endpoints (`/api/agents/...`) |
+| `routes/auth.go` | Auth endpoints (`/api/auth/...`) |
+| `routes/ui_api.go` | UI backend API endpoints |
+
+### Step 3: Apply the right middleware
+
+Choose the appropriate protection level:
+
+#### No auth required (public)
+Exempt paths bypass auth entirely. Add to `isExemptPath()` in `middleware.go` or use the `/api/auth/` prefix (always exempt). Use sparingly — most endpoints should require auth.
+
+#### Standard auth (any authenticated user)
+The global middleware already handles this. API paths (`/api/`, `/v1/`, etc.) automatically require authentication when auth is enabled. You don't need to add any extra middleware.
+
+```go
+router.GET("/v1/my-endpoint", myHandler)  // auth enforced by global middleware
+```
+
+#### Admin only
+Pass `adminMiddleware` to the route. This is set up in `app.go` and passed to `Register*Routes` functions:
+
+```go
+// In the Register function signature, accept the middleware:
+func RegisterMyRoutes(router *echo.Echo, app *application.Application, adminMiddleware echo.MiddlewareFunc) {
+    router.POST("/models/apply", myHandler, adminMiddleware)
+}
+```
+
+#### Feature-gated
+For endpoints that should be toggleable per-user, use feature middleware. There are two approaches:
+
+**Approach A: Route-level middleware** (preferred for groups of related endpoints)
+
+```go
+// In app.go, create the feature middleware:
+myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
+
+// Pass it to the route registration function:
+routes.RegisterMyRoutes(e, app, myFeatureMw)
+
+// In the routes file, apply to a group:
+g := e.Group("/api/my-feature", myFeatureMw)
+g.GET("", listHandler)
+g.POST("", createHandler)
+```
+
+**Approach B: RouteFeatureRegistry** (preferred for individual OpenAI-compatible endpoints)
+
+Add an entry to `RouteFeatureRegistry` in `core/http/auth/features.go`. The `RequireRouteFeature` global middleware will automatically enforce it:
+
+```go
+var RouteFeatureRegistry = []RouteFeature{
+    // ... existing entries ...
+    {"POST", "/v1/my-endpoint", FeatureMyFeature},
+}
+```
+
+## Adding a new feature
+
+When you need a new toggleable feature (not just a new endpoint under an existing feature):
+
+### 1. Define the feature constant
+
+Add to `core/http/auth/permissions.go`:
+
+```go
+const (
+    // Add to the appropriate group:
+    // Agent features (default OFF for new users)
+    FeatureMyFeature = "my_feature"
+
+    // OR API features (default ON for new users)
+    FeatureMyFeature = "my_feature"
+)
+```
+
+Then add it to the appropriate slice:
+
+```go
+// Default OFF — user must be explicitly granted access:
+var AgentFeatures = []string{..., FeatureMyFeature}
+
+// Default ON — user has access unless explicitly revoked:
+var APIFeatures = []string{..., FeatureMyFeature}
+```
+
+### 2. Add feature metadata
+
+In `core/http/auth/features.go`, add to the appropriate `FeatureMetas` function so the admin UI can display it:
+
+```go
+func AgentFeatureMetas() []FeatureMeta {
+    return []FeatureMeta{
+        // ... existing ...
+        {FeatureMyFeature, "My Feature", false},  // false = default OFF
+    }
+}
+```
+
+### 3. Wire up the middleware
+
+In `core/http/app.go`:
+
+```go
+myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
+```
+
+Then pass it to the route registration function.
+
+### 4. Register route-feature mappings (if applicable)
+
+If your feature gates standard API endpoints (like `/v1/...`), add entries to `RouteFeatureRegistry` in `features.go` instead of using per-route middleware.
+
+## Accessing the authenticated user in handlers
+
+```go
+import "github.com/mudler/LocalAI/core/http/auth"
+
+func MyHandler(c echo.Context) error {
+    // Get the user (nil when auth is disabled or unauthenticated)
+    user := auth.GetUser(c)
+    if user == nil {
+        // Handle unauthenticated — or let middleware handle it
+    }
+
+    // Check role
+    if user.Role == auth.RoleAdmin {
+        // admin-specific logic
+    }
+
+    // Check feature access programmatically (when you need conditional behavior, not full blocking)
+    if auth.HasFeatureAccess(db, user, auth.FeatureMyFeature) {
+        // feature-specific logic
+    }
+
+    // Check model access
+    if !auth.IsModelAllowed(db, user, modelName) {
+        return c.JSON(http.StatusForbidden, ...)
+    }
+}
+```
+
+## Middleware composition patterns
+
+Middleware can be composed at different levels. Here are the patterns used in the codebase:
+
+### Group-level middleware (agents pattern)
+```go
+// All routes in the group share the middleware
+g := e.Group("/api/agents", poolReadyMw, agentsMw)
+g.GET("", listHandler)
+g.POST("", createHandler)
+```
+
+### Per-route middleware (localai pattern)
+```go
+// Individual routes get middleware as extra arguments
+router.POST("/models/apply", applyHandler, adminMiddleware)
+router.GET("/metrics", metricsHandler, adminMiddleware)
+```
+
+### Middleware slice (openai pattern)
+```go
+// Build a middleware chain for a handler
+chatMiddleware := []echo.MiddlewareFunc{
+    usageMiddleware,
+    traceMiddleware,
+    modelFilterMiddleware,
+}
+app.POST("/v1/chat/completions", chatHandler, chatMiddleware...)
+```
+
+## Error response format
+
+Always use `schema.ErrorResponse` for auth/permission errors to stay consistent with the OpenAI-compatible API:
+
+```go
+return c.JSON(http.StatusForbidden, schema.ErrorResponse{
+    Error: &schema.APIError{
+        Message: "feature not enabled for your account",
+        Code:    http.StatusForbidden,
+        Type:    "authorization_error",
+    },
+})
+```
+
+Use these HTTP status codes:
+- `401 Unauthorized` — no valid credentials provided
+- `403 Forbidden` — authenticated but lacking permission
+- `429 Too Many Requests` — rate limited (auth endpoints)
+
+## Usage tracking
+
+If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
+
+## Advertising surfaces — where to register a new capability
+
+Beyond routing and auth, LocalAI publishes its capability surface in **four independent places**. When you add an endpoint — especially one introducing a net-new capability like a new media type or a new auth-gated feature — you must update every relevant surface. These aren't optional: missing them means the endpoint works but is invisible to clients, admins, and the UI.
+
+### 1. Swagger `@Tags` annotation (mandatory)
+
+Every handler needs a swagger block so the endpoint appears in `/swagger/index.html` and in the `/api/instructions` output. The `@Tags` value is what groups the endpoint into a capability area:
+
+```go
+// MyEndpoint does X.
+// @Summary Do X.
+// @Tags my-capability
+// @Param request body schema.MyRequest true "payload"
+// @Success 200 {object} schema.MyResponse "Response"
+// @Router /v1/my-endpoint [post]
+func MyEndpoint(...) echo.HandlerFunc { ... }
+```
+
+Use an existing tag when the endpoint extends an existing area (e.g. `audio`, `images`, `face-recognition`). Create a new tag only when the endpoint introduces a genuinely new capability surface — and in that case, also register it in step 2.
+
+After adding endpoints, regenerate the embedded spec so the runtime serves it:
+
+```bash
+make protogen-go         # ensures gRPC codegen is fresh first
+make swagger             # regenerates swagger/swagger.json
+```
+
+### 2. `/api/instructions` registry (for new capability areas)
+
+`core/http/endpoints/localai/api_instructions.go` defines `instructionDefs` — a lightweight, machine-readable index of capability areas that groups swagger endpoints by tag. It's the primary discovery surface for agents and SDKs ("what can this server do?").
+
+**When to update:** only when adding a new capability area (a new swagger tag). Existing-tag additions automatically surface without any change here.
+
+Add an entry to `instructionDefs`:
+
+```go
+{
+    Name:        "my-capability",             // URL segment at /api/instructions/my-capability
+    Description: "Short sentence describing the capability",
+    Tags:        []string{"my-capability"},   // must match swagger @Tags
+    Intro:       "Optional gotcha/context that isn't in the swagger descriptions (caveats, defaults, cross-references to other endpoints).",
+},
+```
+
+Also bump the expected-length count in `api_instructions_test.go` and add the name to the `ContainElements` assertion.
+
+### 3. `capabilities.js` symbol (for new model-config FLAG_* flags)
+
+If your feature needs a new `FLAG_*` usecase flag in `core/config/model_config.go` (so users can filter gallery models by it, and so `/v1/models` surfaces it), also declare the matching symbol in `core/http/react-ui/src/utils/capabilities.js`:
+
+```js
+export const CAP_MY_CAPABILITY = 'FLAG_MY_CAPABILITY'
+```
+
+React pages that want to filter the ModelSelector by capability import this symbol. Declare it even if you're not building the UI page yet — the declaration keeps the Go/JS vocabularies in sync.
+
+### 4. `docs/content/` (user-facing documentation)
+
+A new capability deserves its own page under `docs/content/features/`, plus cross-links from related features and an entry in `docs/content/whats-new.md`. See the pattern used by `face-recognition.md` / `object-detection.md`.
+
+## Path protection rules
+
+The global auth middleware classifies paths as API paths or non-API paths:
+
+- **API paths** (always require auth when auth is enabled): `/api/`, `/v1/`, `/models/`, `/backends/`, `/backend/`, `/tts`, `/vad`, `/video`, `/stores/`, `/system`, `/ws/`, `/metrics`
+- **Exempt paths** (never require auth): `/api/auth/` prefix, anything in `appConfig.PathWithoutAuth`
+- **Non-API paths** (UI, static assets): pass through without auth — the React UI handles login redirects client-side
+
+If you add endpoints under a new top-level path prefix, add it to `isAPIPath()` in `middleware.go` to ensure it requires authentication.
+
+## Checklist
+
+When adding a new endpoint:
+
+**Routing & auth**
+- [ ] Handler in `core/http/endpoints/`
+- [ ] Route registered in appropriate `core/http/routes/` file
+- [ ] Auth level chosen: public / standard / admin / feature-gated
+- [ ] Entry added to `RouteFeatureRegistry` in `core/http/auth/features.go` (one row per route/method — all /v1/* routes gate through this, not per-route middleware)
+- [ ] If new feature: constant in `permissions.go`, added to the right slice (`APIFeatures` default-ON / `AgentFeatures` default-OFF), metadata in `features.go` `*FeatureMetas()`
+- [ ] If feature uses group middleware: wired in `core/http/app.go` and passed to the route registration function
+- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
+- [ ] If token-counting: `usageMiddleware` added to middleware chain
+
+**Advertising surfaces (easy to miss — see the [Advertising surfaces](#advertising-surfaces--where-to-register-a-new-capability) section)**
+- [ ] Swagger block on the handler: `@Summary`, `@Tags`, `@Param`, `@Success`, `@Router`
+- [ ] If new capability area (new swagger tag): entry in `instructionDefs` in `core/http/endpoints/localai/api_instructions.go` + test count bumped in `api_instructions_test.go`
+- [ ] If new `FLAG_*` usecase flag: matching `CAP_*` symbol exported from `core/http/react-ui/src/utils/capabilities.js`
+- [ ] `docs/content/features/<feature>.md` created; cross-links from related feature pages; entry in `docs/content/whats-new.md`
+
+**Quality**
+- [ ] Error responses use `schema.ErrorResponse` format (or `echo.NewHTTPError` with a mapped gRPC status — see the `mapBackendError` helper in `core/http/endpoints/localai/images.go`)
+- [ ] Tests cover both authenticated and unauthenticated access
+- [ ] Swagger regenerated (`make swagger`) if you changed any `@Router`/`@Tags`/`@Param` annotation
+
+## Companion: MCP admin tool surface
+
+**Required for admin endpoints.** Every new admin endpoint MUST be considered for the MCP admin tool surface — the REST API and the MCP tool catalog can drift silently otherwise, and both the LocalAI Assistant chat modality and the standalone `local-ai mcp-server` rely on `pkg/mcp/localaitools/` to mirror REST.
+
+Two outcomes are acceptable; one is not:
+
+- **Tool added.** The new endpoint is something an admin would manage conversationally (install, list, edit, toggle, upgrade). Follow the full checklist in [.agents/localai-assistant-mcp.md](localai-assistant-mcp.md): add a `LocalAIClient` interface method, implement it in both `inproc` and `httpapi`, register the tool with a `Tool*` constant, update the skill prompts, **and add the route to `toolToHTTPRoute` in `pkg/mcp/localaitools/coverage_test.go`**.
+- **Tool deliberately skipped.** The endpoint is internal/diagnostic and adding a chat path would be misleading. Document the decision in the PR description; no code action.
+- **Forgot.** This breaks the contract. The `TestToolHTTPRouteMappingComplete` test in `pkg/mcp/localaitools` is a partial guard (it checks every `Tool*` has a route mapping), but it does NOT detect new REST endpoints without a tool — that's still a process check on the PR author.
+
+**Add to the bottom of the checklist below**:
+- [ ] If admin: decided whether MCP coverage is needed; if yes, tool registered + map updated; if no, skip-reason in PR description.
--- a/.agents/building-and-testing.md
+++ b/.agents/building-and-testing.md
@@ -0,0 +1,16 @@
+# Build and Testing
+
+Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
+
+## Building a specified backend
+
+Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
+
+- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
+- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
+  - Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
+  - l4t and cublas also requires the CUDA major and minor version
+- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:7.2.1 make docker-build-coqui`
+- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
+- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
+- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
--- a/.agents/ci-caching.md
+++ b/.agents/ci-caching.md
@@ -0,0 +1,111 @@
+# CI Build Caching
+
+Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
+
+## Cache layout
+
+- **Cache registry**: `quay.io/go-skynet/ci-cache`
+- **One tag per matrix entry**, derived from the existing `tag-suffix`:
+  - Backend builds (`backend_build.yml`): `cache<tag-suffix>`
+    - e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
+  - Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
+    - e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
+- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
+
+## Read/write semantics
+
+| Trigger | `cache-from` | `cache-to` |
+|---|---|---|
+| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
+| `pull_request` | yes | **no** |
+
+PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
+
+`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
+
+## Self-warming, no separate populator
+
+There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
+
+Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
+
+## The `DEPS_REFRESH` cache-buster (Python backends)
+
+Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
+
+```dockerfile
+ARG DEPS_REFRESH=initial
+RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
+```
+
+Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
+
+`DEPS_REFRESH` defends against that:
+
+- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
+- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
+- Within a week, builds stay warm.
+
+This applies only to `Dockerfile.python` because:
+- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
+- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
+- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
+
+### Adjusting the cadence
+
+If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
+
+## Manually evicting cache
+
+To force a fully cold build for one backend or the whole image:
+
+```bash
+# Delete a single tag (requires quay credentials with admin on the repo)
+curl -X DELETE \
+  -H "Authorization: Bearer ${QUAY_TOKEN}" \
+  https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
+
+# List all tags
+curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
+  "https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
+```
+
+Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
+
+## What the cache **does not** cover
+
+- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
+- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
+- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
+
+## Darwin native caches
+
+`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
+
+| Cache | Path(s) | Key | Scope |
+|---|---|---|---|
+| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
+| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
+| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
+| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
+
+Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
+
+The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
+
+The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
+
+For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
+
+### Cache budget on Darwin
+
+GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
+
+## Touching the cache pipeline
+
+When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
+
+1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
+2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
+3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
+4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
--- a/.agents/coding-style.md
+++ b/.agents/coding-style.md
@@ -0,0 +1,60 @@
+# Coding Style
+
+The project has the following .editorconfig:
+
+```
+root = true
+
+[*]
+indent_style = space
+indent_size = 2
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+
+[*.go]
+indent_style = tab
+
+[Makefile]
+indent_style = tab
+
+[*.proto]
+indent_size = 2
+
+[*.py]
+indent_size = 4
+
+[*.js]
+indent_size = 2
+
+[*.yaml]
+indent_size = 2
+
+[*.md]
+trim_trailing_whitespace = false
+```
+
+- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
+- Prefer modern Go e.g. use `any` not `interface{}`
+
+## Logging
+
+Use `github.com/mudler/xlog` for logging which has the same API as slog.
+
+## Go tests
+
+All Go tests — including backend tests — must use [Ginkgo](https://onsi.github.io/ginkgo/) (v2) with Gomega matchers, not the stdlib `testing` package with `t.Run` / `t.Errorf`. A test file should register a suite with `RegisterFailHandler(Fail)` in a `TestXxx(t *testing.T)` bootstrap and use `Describe`/`Context`/`It` blocks for the actual cases. Look at any existing `*_test.go` under `core/` or `pkg/` for a template.
+
+Do not mix styles within a package. If you are extending tests in a package that already uses Ginkgo, keep using Ginkgo. If you find stdlib-style Go tests in the tree, treat them as tech debt to be migrated rather than as a pattern to follow.
+
+This is enforced by `golangci-lint` via the `forbidigo` linter (see `.golangci.yml`); calls like `t.Errorf` / `t.Fatalf` / `t.Run` / `t.Skip` / `t.Logf` are flagged. Run `make lint` locally before submitting; the same check runs in CI (`.github/workflows/lint.yml`).
+
+## Documentation
+
+The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
+
+- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
+- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
+- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
+- **Shortcodes**: Use `{{% notice note %}}`, `{{% notice tip %}}`, or `{{% notice warning %}}` for callout boxes. Do **not** use `{{% alert %}}` — that shortcode does not exist in this project's Hugo theme and will break the docs build.
--- a/.agents/debugging-backends.md
+++ b/.agents/debugging-backends.md
@@ -0,0 +1,141 @@
+# Debugging and Rebuilding Backends
+
+When a backend fails at runtime (e.g. a gRPC method error, a Python import error, or a dependency conflict), use this guide to diagnose, fix, and rebuild.
+
+## Architecture Overview
+
+- **Source directory**: `backend/python/<name>/` (or `backend/go/<name>/`, `backend/cpp/<name>/`)
+- **Installed directory**: `backends/<name>/` — this is what LocalAI actually runs. It is populated by `make backends/<name>` which builds a Docker image, exports it, and installs it via `local-ai backends install`.
+- **Virtual environment**: `backends/<name>/venv/` — the installed Python venv (for Python backends). The Python binary is at `backends/<name>/venv/bin/python`.
+
+Editing files in `backend/python/<name>/` does **not** affect the running backend until you rebuild with `make backends/<name>`.
+
+## Diagnosing Failures
+
+### 1. Check the logs
+
+Backend gRPC processes log to LocalAI's stdout/stderr. Look for lines tagged with the backend's model ID:
+
+```
+GRPC stderr id="trl-finetune-127.0.0.1:37335" line="..."
+```
+
+Common error patterns:
+- **"Method not implemented"** — the backend is missing a gRPC method that the Go side calls. The model loader (`pkg/model/initializers.go`) always calls `LoadModel` after `Health`; fine-tuning backends must implement it even as a no-op stub.
+- **Python import errors / `AttributeError`** — usually a dependency version mismatch (e.g. `pyarrow` removing `PyExtensionType`).
+- **"failed to load backend"** — the gRPC process crashed or never started. Check stderr lines for the traceback.
+
+### 2. Test the Python environment directly
+
+You can run the installed venv's Python to check imports without starting the full server:
+
+```bash
+backends/<name>/venv/bin/python -c "import datasets; print(datasets.__version__)"
+```
+
+If `pip` is missing from the venv, bootstrap it:
+
+```bash
+backends/<name>/venv/bin/python -m ensurepip
+```
+
+Then use `backends/<name>/venv/bin/python -m pip install ...` to test fixes in the installed venv before committing them to the source requirements.
+
+### 3. Check upstream dependency constraints
+
+When you hit a dependency conflict, check what the main library expects. For example, TRL's upstream `requirements.txt`:
+
+```
+https://github.com/huggingface/trl/blob/main/requirements.txt
+```
+
+Pin minimum versions in the backend's requirements files to match upstream.
+
+## Common Fixes
+
+### Missing gRPC methods
+
+If the Go side calls a method the backend doesn't implement (e.g. `LoadModel`), add a no-op stub in `backend.py`:
+
+```python
+def LoadModel(self, request, context):
+    """No-op — actual loading happens elsewhere."""
+    return backend_pb2.Result(success=True, message="OK")
+```
+
+The gRPC contract requires `LoadModel` to succeed for the model loader to return a usable client, even if the backend doesn't need upfront model loading.
+
+### Dependency version conflicts
+
+Python backends often break when a transitive dependency releases a breaking change (e.g. `pyarrow` removing `PyExtensionType`). Steps:
+
+1. Identify the broken import in the logs
+2. Test in the installed venv: `backends/<name>/venv/bin/python -c "import <module>"`
+3. Check upstream requirements for version constraints
+4. Update **all** requirements files in `backend/python/<name>/`:
+   - `requirements.txt` — base deps (grpcio, protobuf)
+   - `requirements-cpu.txt` — CPU-specific (includes PyTorch CPU index)
+   - `requirements-cublas12.txt` — CUDA 12
+   - `requirements-cublas13.txt` — CUDA 13
+5. Rebuild: `make backends/<name>`
+
+### PyTorch index conflicts (uv resolver)
+
+The Docker build uses `uv` for pip installs. When `--extra-index-url` points to the PyTorch wheel index, `uv` may refuse to fetch packages like `requests` from PyPI if it finds a different version on the PyTorch index first. Fix this by adding `--index-strategy=unsafe-first-match` to `install.sh`:
+
+```bash
+EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
+installRequirements
+```
+
+Most Python backends already do this — check `backend/python/transformers/install.sh` or similar for reference.
+
+## Rebuilding
+
+### Rebuild a single backend
+
+```bash
+make backends/<name>
+```
+
+This runs the Docker build (`Dockerfile.python`), exports the image to `backend-images/<name>.tar`, and installs it into `backends/<name>/`. It also rebuilds the `local-ai` Go binary (without extra tags).
+
+**Important**: If you were previously running with `GO_TAGS=auth`, the `make backends/<name>` step will overwrite your binary without that tag. Rebuild the Go binary afterward:
+
+```bash
+GO_TAGS=auth make build
+```
+
+### Rebuild and restart
+
+After rebuilding a backend, you must restart LocalAI for it to pick up the new backend files. The backend gRPC process is spawned on demand when the model is first loaded.
+
+```bash
+# Kill existing process
+kill <pid>
+
+# Restart
+./local-ai run --debug [your flags]
+```
+
+### Quick iteration (skip Docker rebuild)
+
+For fast iteration on a Python backend's `backend.py` without a full Docker rebuild, you can edit the installed copy directly:
+
+```bash
+# Edit the installed copy
+vim backends/<name>/backend.py
+
+# Restart LocalAI to respawn the gRPC process
+```
+
+This is useful for testing but **does not persist** — the next `make backends/<name>` will overwrite it. Always commit fixes to the source in `backend/python/<name>/`.
+
+## Verification
+
+After fixing and rebuilding:
+
+1. Start LocalAI and confirm the backend registers: look for `Registering backend name="<name>"` in the logs
+2. Trigger the operation that failed (e.g. start a fine-tuning job)
+3. Watch the GRPC stderr/stdout lines for the backend's model ID
+4. Confirm no errors in the traceback
--- a/.agents/llama-cpp-backend.md
+++ b/.agents/llama-cpp-backend.md
@@ -0,0 +1,77 @@
+# llama.cpp Backend
+
+The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
+
+## Building and Testing
+
+- Test llama.cpp backend compilation: `make backends/llama-cpp`
+- The backend is built as part of the main build process
+- Check `backend/cpp/llama-cpp/Makefile` for build configuration
+
+## Architecture
+
+- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
+- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
+- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
+
+## Common Issues When Updating llama.cpp
+
+When fixing compilation errors after upstream changes:
+1. Check how `server.cpp` (HTTP server) handles the same change
+2. Look for new public APIs or getter methods
+3. Store copies of needed data instead of accessing private members
+4. Update function calls to match new signatures
+5. Test with `make backends/llama-cpp`
+
+## Key Differences from HTTP Server
+
+- gRPC uses `BackendServiceImpl` class with gRPC service methods
+- HTTP server uses `server_routes` with HTTP handlers
+- Both use the same `server_context` and task queue infrastructure
+- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
+
+## Tool Call Parsing Maintenance
+
+When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
+
+### Checking for XML Parsing Changes
+
+1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
+2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
+3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
+4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
+   - `COMMON_CHAT_FORMAT_GLM_4_5`
+   - `COMMON_CHAT_FORMAT_MINIMAX_M2`
+   - `COMMON_CHAT_FORMAT_KIMI_K2`
+   - `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
+   - `COMMON_CHAT_FORMAT_APRIEL_1_5`
+   - `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
+   - Any new formats added
+
+### Model Configuration Options
+
+Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
+
+1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
+2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
+3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
+4. **Examples of options to check**:
+   - `ctx_shift` - Context shifting support
+   - `parallel_tool_calls` - Parallel tool calling
+   - `reasoning_format` - Reasoning format options
+   - Any new flags or parameters
+
+### Implementation Guidelines
+
+1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
+2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
+3. **Documentation**: Update relevant documentation when adding new formats or options
+4. **Backward Compatibility**: Ensure changes don't break existing functionality
+
+### Files to Monitor
+
+- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
+- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
+- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
+- `llama.cpp/common/chat.h` - Format enums and parameter structures
+- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
--- a/.agents/localai-assistant-mcp.md
+++ b/.agents/localai-assistant-mcp.md
@@ -0,0 +1,97 @@
+# LocalAI Assistant — admin MCP server
+
+This document is the contract for **anyone** (human or AI agent) touching LocalAI's admin REST surface, the in-process MCP server that wraps it, or the embedded skill prompts that teach the assistant how to use it. Read this before adding/removing/renaming admin endpoints, MCP tools, or skill recipes.
+
+## What this feature is
+
+`pkg/mcp/localaitools/` is a public Go package that exposes LocalAI's admin/management surface as an MCP server. It is used in two ways:
+
+1. **In-process**: when an admin opens a chat with `metadata.localai_assistant=true`, the chat handler injects the in-memory MCP server (paired `net.Pipe()` transport, no HTTP loopback) so the LLM can install models, manage backends and edit configs by chatting.
+2. **Standalone**: the `local-ai mcp-server --target=…` subcommand serves the same MCP server over stdio, talking HTTP to a remote LocalAI instance.
+
+The two modes share **all** tool definitions and skill prompts. They differ only in their `LocalAIClient` implementation (`inproc/` calls services directly; `httpapi/` calls REST).
+
+## The three things you must keep in sync
+
+When you change LocalAI's admin surface, three layers must stay aligned:
+
+1. **REST endpoint** in `core/http/endpoints/localai/*.go`.
+2. **MCP tool registration** in `pkg/mcp/localaitools/tools_*.go`, plus a method on `LocalAIClient` (in `client.go`) and implementations in both `inproc/client.go` **and** `httpapi/client.go`.
+3. **Skill prompt** under `pkg/mcp/localaitools/prompts/skills/*.md` — the markdown that teaches the LLM how to use the new tool. If the new tool fits an existing recipe, update that recipe; otherwise add a new file.
+
+If you ship a REST endpoint without (2) and (3), conversational admins won't see the feature.
+
+## Checklist for adding a new admin endpoint
+
+- [ ] REST endpoint exists in `core/http/endpoints/localai/*.go` and is gated by `auth.RequireAdmin()` in `core/http/routes/localai.go`.
+- [ ] `LocalAIClient` interface in `pkg/mcp/localaitools/client.go` has a method covering the new operation.
+- [ ] DTOs added/updated in `pkg/mcp/localaitools/dto.go` (JSON-tagged; never expose raw service types).
+- [ ] `inproc/client.go` implements the new method by calling the service directly (not via HTTP loopback).
+- [ ] `httpapi/client.go` implements the new method by calling the REST endpoint.
+- [ ] Tool registration added in the appropriate `pkg/mcp/localaitools/tools_*.go`. Mutating tools must reference safety rule 1 in the description.
+- [ ] If the tool is mutating, ensure `Options{DisableMutating: true}` skips it (mirror the pattern in `tools_models.go`).
+- [ ] Skill prompt added or updated under `pkg/mcp/localaitools/prompts/skills/`. The prompt must instruct the LLM when to call the tool, what to ask the user first, and what to do on error.
+- [ ] Tests:
+   - `pkg/mcp/localaitools/server_test.go` adds the tool name to `expectedFullCatalog` and `expectedReadOnlyCatalog` (if read-only).
+   - Tool dispatch is added to `TestEachToolDispatchesToClient`.
+   - `pkg/mcp/localaitools/httpapi/client_test.go` covers the new HTTP path.
+
+## Adding a new skill recipe (no new tool)
+
+Sometimes you want to teach the LLM a new pattern that uses existing tools. Drop a markdown file under `pkg/mcp/localaitools/prompts/skills/<verb>_<noun>.md`. The file is automatically embedded by `//go:embed` and assembled into the system prompt in lexicographic order. No Go changes needed.
+
+Conventions:
+- Filename: `<verb>_<noun>.md` (e.g. `install_chat_model.md`, `upgrade_backend.md`).
+- First line: `# Skill: <Title Case description>`.
+- Number the steps. Reference exact tool names in backticks.
+- If the skill mutates state, remind the LLM to confirm with the user.
+
+## Code conventions
+
+These rules guard against the magic-literal drift that surfaced in the first audit. Do not re-introduce bare strings.
+
+- **Tool names** always come from the `Tool*` constants in `pkg/mcp/localaitools/tools.go`. Tool registrations, the test catalog (`server_test.go`'s `expectedFullCatalog` / `expectedReadOnlyCatalog`), and dispatch tables reference the constants. The embedded skill prompts under `prompts/` keep bare strings — that's the one allowed exception, and `TestPromptsContainSafetyAnchors` enforces alignment.
+- **Toggle/pin actions** use the `modeladmin.Action` type (`pkg/mcp/localaitools` and `core/services/modeladmin`). Use `ActionEnable`/`ActionDisable`/`ActionPin`/`ActionUnpin`; never bare `"enable"`/`"pin"` strings.
+- **Capability tags** for `list_installed_models` use the `localaitools.Capability` type (`capability.go`). The `LocalAIClient.ListInstalledModels` interface takes a typed `Capability`, and the `inproc` switch only accepts canonical values (`"embed"`/`"embedding"` are not aliases — only `CapabilityEmbeddings`).
+- **HTTP error checks** in `httpapi.Client` use `errors.Is(err, ErrHTTPNotFound)`, not substring matches on `err.Error()`. The typed `*HTTPError` carries `StatusCode` and `Body`; add new sentinel errors as needed rather than re-introducing string matching.
+- **Channel sends** to `GalleryService.ModelGalleryChannel` / `BackendGalleryChannel` from inproc clients MUST select on `ctx.Done()` so a cancelled chat completion releases the goroutine. See `inproc.sendModelOp` / `sendBackendOp`.
+- **Disk writes** of model config YAML go through `modeladmin.writeFileAtomic` (temp file + `os.Rename`). `os.WriteFile` truncates on crash and corrupts the model.
+- **MCP server lifecycle**: every initialised holder MUST register `Close()` with `signals.RegisterGracefulTerminationHandler`. The standalone `mcp-server` CLI uses `signal.NotifyContext` to honour SIGINT/SIGTERM.
+
+## File map (where to look)
+
+```
+pkg/mcp/localaitools/
+  client.go              # LocalAIClient interface + DTO registry
+  dto.go                 # JSON-tagged DTOs shared by both client impls
+  server.go              # NewServer(client, opts) — registers tools
+  tools.go               # Tool* name constants (single source of truth)
+  capability.go          # Capability type + constants
+  tools_models.go        # gallery_search, install_model, import_model_uri, ...
+  tools_backends.go
+  tools_config.go
+  tools_system.go
+  tools_state.go
+  prompts.go             # //go:embed loader + SystemPrompt(opts)
+  prompts/00_role.md
+  prompts/10_safety.md   # SAFETY RULES — change with care
+  prompts/20_tools.md    # curated tool catalog with one-liners
+  prompts/skills/*.md
+  inproc/client.go       # in-process LocalAIClient (services-direct)
+  httpapi/client.go      # REST LocalAIClient (for standalone CLI / remote)
+core/http/endpoints/mcp/
+  localai_assistant.go   # process-wide holder + LocalToolExecutor
+core/cli/mcp_server.go   # local-ai mcp-server subcommand
+```
+
+## Why two clients
+
+The in-process MCP server runs inside the same LocalAI binary that serves chat. Going over HTTP loopback would (a) require minting a synthetic admin API key for the server to authenticate against itself, (b) double-marshal every tool dispatch, and (c) lose access to in-process channels (e.g. `GalleryService.ModelGalleryChannel` for streaming install progress). So in-process uses `inproc.Client`. The standalone stdio CLI talks to a *remote* LocalAI; HTTP is the only option, so it uses `httpapi.Client`. Both implement the same `LocalAIClient` interface, and the parity test in `pkg/mcp/localaitools/parity_test.go` (when present) keeps their output equivalent.
+
+## Why prompt-enforced confirmation, not code gates
+
+The user chose KISS. Every mutating tool has a safety rule (`prompts/10_safety.md` rule 1) that requires the LLM to summarise the action and wait for explicit user confirmation before calling it. There is no `plan_*`/`apply_*` two-step in code. If you add a mutating tool, do **not** add per-tool confirmation logic in Go — instead, list the new tool name in `prompts/10_safety.md` so the LLM knows it falls under the confirmation rule.
+
+## Distributed mode
+
+The in-memory MCP server runs only on the head node (where the chat handler runs). `inproc.Client` wraps services that are already distributed-aware (`GalleryService` coordinates with workers; `ListNodes` reads the NATS-populated registry). No NATS routing of MCP tools — the admin surface lives on the head, period.
--- a/.agents/testing-mcp-apps.md
+++ b/.agents/testing-mcp-apps.md
@@ -0,0 +1,120 @@
+# Testing MCP Apps (Interactive Tool UIs)
+
+MCP Apps is an extension to MCP where tools declare interactive HTML UIs via `_meta.ui.resourceUri`. When the LLM calls such a tool, the UI renders the app in a sandboxed iframe inline in the chat. The app communicates bidirectionally with the host via `postMessage` (JSON-RPC) and can call server tools, send messages, and update model context.
+
+Spec: https://modelcontextprotocol.io/extensions/apps/overview
+
+## Quick Start: Run a Test MCP App Server
+
+The `@modelcontextprotocol/server-basic-react` npm package is a ready-to-use test server that exposes a `get-time` tool with an interactive React clock UI. It requires Node >= 20, so run it in Docker:
+
+```bash
+docker run -d --name mcp-app-test -p 3001:3001 node:22-slim \
+  sh -c 'npx -y @modelcontextprotocol/server-basic-react'
+```
+
+Wait ~10 seconds for it to start, then verify:
+
+```bash
+# Check it's running
+docker logs mcp-app-test
+# Expected: "MCP server listening on http://localhost:3001/mcp"
+
+# Verify MCP protocol works
+curl -s -X POST http://localhost:3001/mcp \
+  -H 'Content-Type: application/json' \
+  -H 'Accept: application/json, text/event-stream' \
+  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
+
+# List tools — should show get-time with _meta.ui.resourceUri
+curl -s -X POST http://localhost:3001/mcp \
+  -H 'Content-Type: application/json' \
+  -H 'Accept: application/json, text/event-stream' \
+  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
+```
+
+The `tools/list` response should contain:
+```json
+{
+  "name": "get-time",
+  "_meta": {
+    "ui": { "resourceUri": "ui://get-time/mcp-app.html" }
+  }
+}
+```
+
+## Testing in LocalAI's UI
+
+1. Make sure LocalAI is running (e.g. `http://localhost:8080`)
+2. Build the React UI: `cd core/http/react-ui && npm install && npm run build`
+3. Open the Chat page in your browser
+4. Click **"Client MCP"** in the chat header
+5. Add a new client MCP server:
+   - **URL**: `http://localhost:3001/mcp`
+   - **Use CORS proxy**: enabled (default) — required because the browser can't hit `localhost:3001` directly due to CORS; LocalAI's proxy at `/api/cors-proxy` handles it
+6. The server should connect and discover the `get-time` tool
+7. Select a model and send: **"What time is it?"**
+8. The LLM should call the `get-time` tool
+9. The tool result should render the interactive React clock app in an iframe as a standalone chat message (not inside the collapsed activity group)
+
+## What to Verify
+
+- [ ] Tool appears in the connected tools list (not filtered — `get-time` is callable by the LLM)
+- [ ] The iframe renders as a standalone chat message with a puzzle-piece icon
+- [ ] The app loads and is interactive (clock UI, buttons work)
+- [ ] No "Reconnect to MCP server" overlay (connection is live)
+- [ ] Console logs show bidirectional communication:
+  - `tools/call` messages from app to host (app calling server tools)
+  - `ui/message` notifications (app sending messages)
+- [ ] After the app renders, the LLM continues and produces a text response with the time
+- [ ] Non-UI tools continue to work normally (text-only results)
+- [ ] Page reload shows the HTML statically with a reconnect overlay until you reconnect
+
+## Console Log Patterns
+
+Healthy bidirectional communication looks like:
+
+```
+Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Bridge init
+get-time result: { content: [...] }                          // Tool result received
+Calling get-time tool...                                     // App calls tool
+Sending message { method: "tools/call", ... }                // App -> host -> server
+Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Server response
+Sending message text to Host: ...                            // App sends message
+Sending message { method: "ui/message", ... }                // Message notification
+Message accepted                                             // Host acknowledged
+```
+
+Benign warnings to ignore:
+- `Source map error: ... about:srcdoc` — browser devtools can't find source maps for srcdoc iframes
+- `Ignoring message from unknown source` — duplicate postMessage from iframe navigation
+- `notifications/cancelled` — app cleaning up previous requests
+
+## Architecture Notes
+
+- **No server-side changes needed** — the MCP App protocol runs entirely in the browser
+- `PostMessageTransport` wraps `window.postMessage` between host and `srcdoc` iframe
+- `AppBridge` (from `@modelcontextprotocol/ext-apps`) auto-forwards `tools/call`, `resources/read`, `resources/list` from the app to the MCP server via the host's `Client`
+- The iframe uses `sandbox="allow-scripts allow-forms"` (no `allow-same-origin`) — opaque origin, no access to host cookies/DOM/localStorage
+- App-only tools (`_meta.ui.visibility: "app-only"`) are filtered from the LLM's tool list but remain callable by the app iframe
+
+## Key Files
+
+- `core/http/react-ui/src/components/MCPAppFrame.jsx` — iframe + AppBridge component
+- `core/http/react-ui/src/hooks/useMCPClient.js` — MCP client hook with app UI helpers (`hasAppUI`, `getAppResource`, `getClientForTool`, `getToolDefinition`)
+- `core/http/react-ui/src/hooks/useChat.js` — agentic loop, attaches `appUI` to tool_result messages
+- `core/http/react-ui/src/pages/Chat.jsx` — renders MCPAppFrame as standalone chat messages
+
+## Other Test Servers
+
+The `@modelcontextprotocol/ext-apps` repo has many example servers:
+- `@modelcontextprotocol/server-basic-react` — simple clock (React)
+- More examples at https://github.com/modelcontextprotocol/ext-apps/tree/main/examples
+
+All examples support both stdio and HTTP transport. Run without `--stdio` for HTTP mode on port 3001.
+
+## Cleanup
+
+```bash
+docker rm -f mcp-app-test
+```
--- a/.agents/vllm-backend.md
+++ b/.agents/vllm-backend.md
@@ -0,0 +1,115 @@
+# Working on the vLLM Backend
+
+The vLLM backend lives at `backend/python/vllm/backend.py` (async gRPC) and the multimodal variant at `backend/python/vllm-omni/backend.py` (sync gRPC). Both wrap vLLM's `AsyncLLMEngine` / `Omni` and translate the LocalAI gRPC `PredictOptions` into vLLM `SamplingParams` + outputs into `Reply.chat_deltas`.
+
+This file captures the non-obvious bits — most of the bring-up was a single PR (`feat/vllm-parity`) and the things below are easy to get wrong.
+
+## Tool calling and reasoning use vLLM's *native* parsers
+
+Do not write regex-based tool-call extractors for vLLM. vLLM ships:
+
+- `vllm.tool_parsers.ToolParserManager` — 50+ registered parsers (`hermes`, `llama3_json`, `llama4_pythonic`, `mistral`, `qwen3_xml`, `deepseek_v3`, `granite4`, `openai`, `kimi_k2`, `glm45`, …)
+- `vllm.reasoning.ReasoningParserManager` — 25+ registered parsers (`deepseek_r1`, `qwen3`, `mistral`, `gemma4`, …)
+
+Both can be used standalone: instantiate with a tokenizer, call `extract_tool_calls(text, request=None)` / `extract_reasoning(text, request=None)`. The backend stores the parser *classes* on `self.tool_parser_cls` / `self.reasoning_parser_cls` at LoadModel time and instantiates them per request.
+
+**Selection:** vLLM does *not* auto-detect parsers from model name — neither does the LocalAI backend. The user (or `core/config/hooks_vllm.go`) must pick one and pass it via `Options[]`:
+
+```yaml
+options:
+  - tool_parser:hermes
+  - reasoning_parser:qwen3
+```
+
+Auto-defaults for known model families live in `core/config/parser_defaults.json` and are applied:
+- at gallery import time by `core/gallery/importers/vllm.go`
+- at model load time by the `vllm` / `vllm-omni` backend hook in `core/config/hooks_vllm.go`
+
+User-supplied `tool_parser:`/`reasoning_parser:` in the config wins over defaults — the hook checks for existing entries before appending.
+
+**When to update `parser_defaults.json`:** any time vLLM ships a new tool or reasoning parser, or you onboard a new model family that LocalAI users will pull from HuggingFace. The file is keyed by *family pattern* matched against `normalizeModelID(cfg.Model)` (lowercase, org-prefix stripped, `_`→`-`). Patterns are checked **longest-first** — keep `qwen3.5` before `qwen3`, `llama-3.3` before `llama-3`, etc., or the wrong family wins. Add a covering test in `core/config/hooks_test.go`.
+
+**Sister file — `core/config/inference_defaults.json`:** same pattern but for sampling parameters (temperature, top_p, top_k, min_p, repeat_penalty, presence_penalty). Loaded by `core/config/inference_defaults.go` and applied by `ApplyInferenceDefaults()`. The schema is `map[string]float64` only — *strings don't fit*, which is why parser defaults needed their own JSON file. The inference file is **auto-generated from unsloth** via `go generate ./core/config/` (see `core/config/gen_inference_defaults/`) — don't hand-edit it; instead update the upstream source or regenerate. Both files share `normalizeModelID()` and the longest-first pattern ordering.
+
+**Constructor compatibility gotcha:** the abstract `ToolParser.__init__` accepts `tools=`, but several concrete parsers (Hermes2ProToolParser, etc.) override `__init__` and *only* accept `tokenizer`. Always:
+
+```python
+try:
+    tp = self.tool_parser_cls(self.tokenizer, tools=tools)
+except TypeError:
+    tp = self.tool_parser_cls(self.tokenizer)
+```
+
+## ChatDelta is the streaming contract
+
+The Go side (`core/backend/llm.go`, `pkg/functions/chat_deltas.go`) consumes `Reply.chat_deltas` to assemble the OpenAI response. For tool calls to surface in `chat/completions`, the Python backend **must** populate `Reply.chat_deltas[].tool_calls` with `ToolCallDelta{index, id, name, arguments}`. Returning the raw `<tool_call>...</tool_call>` text in `Reply.message` is *not* enough — the Go regex fallback exists for llama.cpp, not for vllm.
+
+Same story for `reasoning_content` — emit it on `ChatDelta.reasoning_content`, not as part of `content`.
+
+## Message conversion to chat templates
+
+`tokenizer.apply_chat_template()` expects a list of dicts, not proto Messages. The shared helper in `backend/python/common/vllm_utils.py` (`messages_to_dicts`) handles the mapping including:
+
+- `tool_call_id` and `name` for `role="tool"` messages
+- `tool_calls` JSON-string field → parsed Python list for `role="assistant"`
+- `reasoning_content` for thinking models
+
+Pass `tools=json.loads(request.Tools)` and (when `request.Metadata.get("enable_thinking") == "true"`) `enable_thinking=True` to `apply_chat_template`. Wrap in `try/except TypeError` because not every tokenizer template accepts those kwargs.
+
+## CPU support and the SIMD/library minefield
+
+vLLM publishes prebuilt CPU wheels at `https://github.com/vllm-project/vllm/releases/...`. The pin lives in `backend/python/vllm/requirements-cpu-after.txt`.
+
+**Version compatibility — important:** newer vllm CPU wheels (≥ 0.15) declare `torch==2.10.0+cpu` as a hard dep, but `torch==2.10.0` only exists on the PyTorch test channel and pulls in an incompatible `torchvision`. Stay on **`vllm 0.14.1+cpu` + `torch 2.9.1+cpu`** until both upstream catch up. Bumping requires verifying torchvision/torchaudio match.
+
+`requirements-cpu.txt` uses `--extra-index-url https://download.pytorch.org/whl/cpu`. `install.sh` adds `--index-strategy=unsafe-best-match` for the `cpu` profile so uv resolves transformers/vllm from PyPI while pulling torch from the PyTorch index.
+
+**SIMD baseline:** the prebuilt CPU wheel is compiled with AVX-512 VNNI/BF16. On a CPU without those instructions, importing `vllm.model_executor.models.registry` SIGILLs at `_run_in_subprocess` time during model inspection. There is no runtime flag to disable it. Workarounds:
+
+1. **Run on a host with the right SIMD baseline** (default — fast)
+2. **Build from source** with `FROM_SOURCE=true` env var. Plumbing exists end-to-end:
+   - `install.sh` hides `requirements-cpu-after.txt`, runs `installRequirements` for the base deps, then clones vllm and `VLLM_TARGET_DEVICE=cpu uv pip install --no-deps .`
+   - `backend/Dockerfile.python` declares `ARG FROM_SOURCE` + `ENV FROM_SOURCE`
+   - `Makefile` `docker-build-backend` macro forwards `--build-arg FROM_SOURCE=$(FROM_SOURCE)` when set
+   - Source build takes 30–50 minutes — too slow for per-PR CI but fine for local.
+
+**Runtime shared libraries:** vLLM's `vllm._C` extension `dlopen`s `libnuma.so.1` at import time. If missing, the C extension silently fails and `torch.ops._C_utils.init_cpu_threads_env` is never registered → `EngineCore` crashes on `init_device` with:
+
+```
+AttributeError: '_OpNamespace' '_C_utils' object has no attribute 'init_cpu_threads_env'
+```
+
+`backend/python/vllm/package.sh` bundles `libnuma.so.1` and `libgomp.so.1` into `${BACKEND}/lib/`, which `libbackend.sh` adds to `LD_LIBRARY_PATH` at run time. The builder stage in `backend/Dockerfile.python` installs `libnuma1`/`libgomp1` so package.sh has something to copy. Do *not* assume the production host has these — backend images are `FROM scratch`.
+
+## Backend hook system (`core/config/backend_hooks.go`)
+
+Per-backend defaults that used to be hardcoded in `ModelConfig.Prepare()` now live in `core/config/hooks_*.go` files and self-register via `init()`:
+
+- `hooks_llamacpp.go` → GGUF metadata parsing, context size, GPU layers, jinja template
+- `hooks_vllm.go` → tool/reasoning parser auto-selection from `parser_defaults.json`
+
+Hook keys:
+- `"llama-cpp"`, `"vllm"`, `"vllm-omni"`, … — backend-specific
+- `""` — runs only when `cfg.Backend` is empty (auto-detect case)
+- `"*"` — global catch-all, runs for every backend before specific hooks
+
+Multiple hooks per key are supported and run in registration order. Adding a new backend default:
+
+```go
+// core/config/hooks_<backend>.go
+func init() {
+    RegisterBackendHook("<backend>", myDefaults)
+}
+func myDefaults(cfg *ModelConfig, modelPath string) {
+    // only fill in fields the user didn't set
+}
+```
+
+## The `Messages.ToProto()` fields you need to set
+
+`core/schema/message.go:ToProto()` must serialize:
+- `ToolCallID` → `proto.Message.ToolCallId` (for `role="tool"` messages — links result back to the call)
+- `Reasoning` → `proto.Message.ReasoningContent`
+- `ToolCalls` → `proto.Message.ToolCalls` (JSON-encoded string)
+
+These were originally not serialized and tool-calling conversations broke silently — the C++ llama.cpp backend reads them but always got empty strings. Any new field added to `schema.Message` *and* `proto.Message` needs a matching line in `ToProto()`.
--- a/.air.toml
+++ b/.air.toml
@@ -0,0 +1,8 @@
+# .air.toml
+[build]
+cmd = "make build"
+bin = "./local-ai"
+args_bin = [ "--debug" ]
+include_ext = ["go", "html", "yaml", "toml", "json", "txt", "md"]
+exclude_dir = ["pkg/grpc/proto"]
+delay = 1000
--- a/Generation/musicgen.bru
+++ b/Generation/musicgen.bru
@@ -1,23 +0,0 @@
-meta {
-  name: musicgen
-  type: http
-  seq: 1
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/sound-generation
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model_id": "facebook/musicgen-small",
-      "text": "Exciting 80s Newscast Interstitial",
-      "duration_seconds": 8
-  }
-}
--- a/Requests/backend
+++ b/Requests/backend
@@ -1,17 +0,0 @@
-meta {
-  name: backend monitor
-  type: http
-  seq: 4
-}
-
-get {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/monitor
-  body: json
-  auth: none
-}
-
-body:json {
-  {
-    "model": "{{DEFAULT_MODEL}}"
-  }
-}
--- a/monitor/backend-shutdown.bru
+++ b/monitor/backend-shutdown.bru
@@ -1,21 +0,0 @@
-meta {
-  name: backend-shutdown
-  type: http
-  seq: 3
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/backend/shutdown
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model": "{{DEFAULT_MODEL}}"
-  }
-}
--- a/Requests/bruno.json
+++ b/Requests/bruno.json
@@ -1,5 +0,0 @@
-{
-  "version": "1",
-  "name": "LocalAI Test Requests",
-  "type": "collection"
-}
--- a/Requests/environments/localhost.bru
+++ b/Requests/environments/localhost.bru
@@ -1,6 +0,0 @@
-vars {
-  HOST: localhost
-  PORT: 8080
-  DEFAULT_MODEL: gpt-3.5-turbo
-  PROTOCOL: http://
-}
--- a/.bruno/LocalAI
+++ b/.bruno/LocalAI
@@ -1,11 +0,0 @@
-meta {
-  name: get models list
-  type: http
-  seq: 2
-}
-
-get {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models
-  body: none
-  auth: none
-}
--- a/generation/Generate
+++ b/generation/Generate
@@ -1,25 +0,0 @@
-meta {
-  name: Generate image
-  type: http
-  seq: 1
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/images/generations
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-    "prompt": "<positive prompt>|<negative prompt>",
-    "model": "model-name",
-    "step": 51,
-    "size": "1024x1024",
-    "image": ""
-  }
-}
--- a/text/-completions.bru
+++ b/text/-completions.bru
@@ -1,24 +0,0 @@
-meta {
-  name: -completions
-  type: http
-  seq: 4
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/completions
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model": "{{DEFAULT_MODEL}}",
-      "prompt": "function downloadFile(string url, string outputPath) {",
-      "max_tokens": 256,
-      "temperature": 0.5
-  }
-}
--- a/text/-edits.bru
+++ b/text/-edits.bru
@@ -1,23 +0,0 @@
-meta {
-  name: -edits
-  type: http
-  seq: 5
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/edits
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model": "{{DEFAULT_MODEL}}",
-      "input": "What day of the wek is it?",
-      "instruction": "Fix the spelling mistakes"
-  }
-}
--- a/text/-embeddings.bru
+++ b/text/-embeddings.bru
@@ -1,22 +0,0 @@
-meta {
-  name: -embeddings
-  type: http
-  seq: 6
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/embeddings
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model": "{{DEFAULT_MODEL}}",
-      "input": "A STRANGE GAME.\nTHE ONLY WINNING MOVE IS NOT TO PLAY.\n\nHOW ABOUT A NICE GAME OF CHESS?"
-  }
-}
--- a/text/chat/chat
+++ b/text/chat/chat
@@ -1,30 +0,0 @@
-meta {
-  name: chat completion -simple- 1 message-
-  type: http
-  seq: 4
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-    "model": "{{DEFAULT_MODEL}}",
-    "messages": [
-      {
-        "role": "user",
-        "content": "How could one use friction to cook an egg?"
-      }
-    ],
-    "max_tokens": 256,
-    "temperature": 0.2,
-    "grammar": ""
-  }
-}
--- a/text/chat/chat-completions
+++ b/text/chat/chat-completions
@@ -1,29 +0,0 @@
-meta {
-  name: chat-completions -long-
-  type: http
-  seq: 5
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-       "model": "{{DEFAULT_MODEL}}",
-       "messages": [{"role": "system", "content": "You are a helpful, intelligent pirate assistant. Always answer as helpfully as possible, while being a pirate and using pirate language. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."},
-           {"role": "user", "content": "How could one use electricity to cook an egg?"},
-           {"role": "assistant",
-                  "content": "Shiver me timbers! Using electricity to cook an egg? Well, matey, I reckon that be a right curious idea! *adjusts spectacles* Now, I ain't sure if this be possible, but I'll do me best to help ye out.\n\nFirst things first, ye gotta understand that electricity be a powerful force, and it can't just cook an egg on its own. Ye see, electricity be like a mighty wind that can make things happen, but it needs somethin' to work with. So, if ye want to use electricity to cook an egg, ye gotta find a way to harness that power and make it do the cookin'.\n\nNow, I know what ye might be thinkin': \"How do I harness the power of electricity to cook an egg?\" Well, matey, there be a few ways to do it. One way be to use a special device called an \"electric frying pan.\" This be a pan that has a built-in heating element that gets hot when ye plug it into a wall socket. When the element gets hot, ye can crack an egg into the pan and watch as it cook"
-              },
-              {"role": "user", "content": "I don't have one of those, just a raw wire and plenty of power! How do we get it done?"}],
-       "max_tokens": 1024,
-       "temperature": 0.5
-  }
-}
--- a/text/chat/chat-completions
+++ b/text/chat/chat-completions
@@ -1,25 +0,0 @@
-meta {
-  name: chat-completions -stream-
-  type: http
-  seq: 6
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/chat/completions
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-       "model": "{{DEFAULT_MODEL}}",
-       "messages": [{"role": "user", "content": "Explain how I can set sail on the ocean using only power generated by seagulls?"}],
-       "max_tokens": 256,
-       "temperature": 0.9,
-       "stream": true
-  }
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,22 +0,0 @@
-meta {
-  name: add model gallery
-  type: http
-  seq: 10
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "url": "file:///home/dave/projects/model-gallery/huggingface/TheBloke__CodeLlama-7B-Instruct-GGML.yaml",
-      "name": "test"
-  }
-}
--- a/gallery/delete
+++ b/gallery/delete
@@ -1,21 +0,0 @@
-meta {
-  name: delete model gallery
-  type: http
-  seq: 11
-}
-
-delete {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "name": "test"
-  }
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,11 +0,0 @@
-meta {
-  name: list MODELS in galleries
-  type: http
-  seq: 7
-}
-
-get {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/available
-  body: none
-  auth: none
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,11 +0,0 @@
-meta {
-  name: list model GALLERIES
-  type: http
-  seq: 8
-}
-
-get {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
-  body: none
-  auth: none
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,11 +0,0 @@
-meta {
-  name: model delete
-  type: http
-  seq: 7
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/galleries
-  body: none
-  auth: none
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,21 +0,0 @@
-meta {
-  name: model gallery apply -gist-
-  type: http
-  seq: 12
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "id": "TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q2_K.bin"
-  }
-}
--- a/Requests/model
+++ b/Requests/model
@@ -1,22 +0,0 @@
-meta {
-  name: model gallery apply
-  type: http
-  seq: 9
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/models/apply
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "id": "dave@TheBloke__CodeLlama-7B-Instruct-GGML__codellama-7b-instruct.ggmlv3.Q3_K_S.bin",
-      "name": "codellama7b"
-  }
-}
--- a/Requests/transcription/gb1.ogg
+++ b/Requests/transcription/gb1.ogg
--- a/Requests/transcription/transcribe.bru
+++ b/Requests/transcription/transcribe.bru
@@ -1,16 +0,0 @@
-meta {
-  name: transcribe
-  type: http
-  seq: 1
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/v1/audio/transcriptions
-  body: multipartForm
-  auth: none
-}
-
-body:multipart-form {
-  file: @file(transcription/gb1.ogg)
-  model: whisper-1
-}
--- a/Requests/tts/-tts.bru
+++ b/Requests/tts/-tts.bru
@@ -1,22 +0,0 @@
-meta {
-  name: -tts
-  type: http
-  seq: 2
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/tts
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "model": "{{DEFAULT_MODEL}}",
-      "input": "A STRANGE GAME.\nTHE ONLY WINNING MOVE IS NOT TO PLAY.\n\nHOW ABOUT A NICE GAME OF CHESS?"
-  }
-}
--- a/Requests/tts/musicgen.bru
+++ b/Requests/tts/musicgen.bru
@@ -1,23 +0,0 @@
-meta {
-  name: musicgen
-  type: http
-  seq: 2
-}
-
-post {
-  url: {{PROTOCOL}}{{HOST}}:{{PORT}}/tts
-  body: json
-  auth: none
-}
-
-headers {
-  Content-Type: application/json
-}
-
-body:json {
-  {
-      "backend": "transformers",
-      "model": "facebook/musicgen-small",
-      "input": "80s Synths playing Jazz"
-  }
-}
--- a/.devcontainer-scripts/poststart.sh
+++ b/.devcontainer-scripts/poststart.sh
@@ -2,9 +2,6 @@

 cd /workspace

-# Grab the pre-stashed backend assets to avoid build issues
-cp -r /build/backend-assets /workspace/backend-assets
-
 # Ensures generated source files are present upon load
 make prepare

--- a/.devcontainer/docker-compose-devcontainer.yml
+++ b/.devcontainer/docker-compose-devcontainer.yml
@@ -4,17 +4,14 @@ services:
      context: ..
      dockerfile: Dockerfile
      target: devcontainer
-      args:
-      - FFMPEG=true
-      - IMAGE_TYPE=extras
-      - GO_TAGS=p2p tts
    env_file:
      - ../.env
    ports:
      - 8080:8080
    volumes:
      - localai_workspace:/workspace
-      - ../models:/host-models
+      - models:/host-models
+      - backends:/host-backends
      - ./customization:/devcontainer-customization
    command: /bin/sh -c "while sleep 1000; do :; done"
    cap_add:
@@ -43,6 +40,9 @@ services:
      - GF_SECURITY_ADMIN_PASSWORD=grafana
    volumes:
      - ./grafana:/etc/grafana/provisioning/datasources
+
 volumes:
  prom_data:
-  localai_workspace:
+  localai_workspace:
+  models:
+  backends:
--- a/.docker/apt-mirror.sh
+++ b/.docker/apt-mirror.sh
@@ -0,0 +1,39 @@
+#!/bin/sh
+# Reconfigure Ubuntu apt sources to point at an alternate mirror.
+#
+# Used by Dockerfiles via `RUN --mount=type=bind,source=.docker/apt-mirror.sh,...`
+# and by CI workflows on the runner to mitigate outages of the default
+# archive.ubuntu.com / security.ubuntu.com / ports.ubuntu.com pool.
+#
+# Inputs (env):
+#   APT_MIRROR        Replacement for archive.ubuntu.com and security.ubuntu.com
+#                     (e.g. "http://azure.archive.ubuntu.com" or
+#                      "https://mirrors.edge.kernel.org").
+#                     Leave empty to keep upstream. The trailing "/ubuntu/..."
+#                     path is preserved by the rewrite.
+#   APT_PORTS_MIRROR  Replacement for ports.ubuntu.com (arm64/ppc64el/...).
+#                     Leave empty to keep upstream.
+#
+# Both default to empty, in which case the script is a no-op.
+
+set -e
+
+if [ -z "${APT_MIRROR}" ] && [ -z "${APT_PORTS_MIRROR}" ]; then
+    exit 0
+fi
+
+# Ubuntu 24.04 (noble) ships DEB822 sources at /etc/apt/sources.list.d/ubuntu.sources;
+# older releases use /etc/apt/sources.list. We rewrite whichever exists.
+for f in /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list; do
+    [ -f "$f" ] || continue
+    if [ -n "${APT_MIRROR}" ]; then
+        # Use a comma delimiter so the alternation pipe in the regex
+        # is not interpreted as the s/// separator.
+        sed -i -E "s,https?://(archive\.ubuntu\.com|security\.ubuntu\.com),${APT_MIRROR},g" "$f"
+    fi
+    if [ -n "${APT_PORTS_MIRROR}" ]; then
+        sed -i -E "s,https?://ports\.ubuntu\.com,${APT_PORTS_MIRROR},g" "$f"
+    fi
+done
+
+echo "apt-mirror: rewrote sources (APT_MIRROR='${APT_MIRROR}', APT_PORTS_MIRROR='${APT_PORTS_MIRROR}')"
--- a/.dockerignore
+++ b/.dockerignore
@@ -3,7 +3,13 @@
 .vscode
 .devcontainer
 models
+backends
 examples/chatbot-ui/models
+backend/go/image/stablediffusion-ggml/build/
+backend/go/*/build
+backend/go/*/.cache
+backend/go/*/sources
+backend/go/*/package
 examples/rwkv/models
 examples/**/models
 Dockerfile*
@@ -14,4 +20,4 @@ __pycache__

 # backend virtual environments
 **/venv
-backend/python/**/source
+backend/python/**/source
--- a/.env
+++ b/.env
@@ -26,24 +26,14 @@
 ## Disables COMPEL (Diffusers)
 # COMPEL=0

+## Disables SD_EMBED (Diffusers)
+# SD_EMBED=0
+
 ## Enable/Disable single backend (useful if only one GPU is available)
 # LOCALAI_SINGLE_ACTIVE_BACKEND=true

-## Specify a build type. Available: cublas, openblas, clblas.
-## cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.
-## OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.
-## clBLAS:   This is an open-source implementation of the BLAS library that uses OpenCL, a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. clBLAS is designed to take advantage of the parallel computing power of GPUs but can also run on any hardware that supports OpenCL. This includes hardware from different vendors like Nvidia, AMD, and Intel.
-# BUILD_TYPE=openblas
-
-## Uncomment and set to true to enable rebuilding from source
-# REBUILD=true
-
-## Enable go tags, available: p2p, tts
-## p2p: enable distributed inferencing
-## tts: enables text-to-speech with go-piper 
-## (requires REBUILD=true)
-#
-# GO_TAGS=p2p
+# Forces shutdown of the backends if busy (only if LOCALAI_SINGLE_ACTIVE_BACKEND is set)
+# LOCALAI_FORCE_BACKEND_SHUTDOWN=true

 ## Path where to store generated images
 # LOCALAI_IMAGE_PATH=/tmp/generated/images
@@ -73,7 +63,7 @@

 ### Define a list of GRPC Servers for llama-cpp workers to distribute the load
 # https://github.com/ggerganov/llama.cpp/pull/6829
-# https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md
+# https://github.com/ggerganov/llama.cpp/blob/master/tools/rpc/README.md
 # LLAMACPP_GRPC_SERVERS=""

 ### Enable to run parallel requests
--- a/.github/actions/configure-apt-mirror/action.yml
+++ b/.github/actions/configure-apt-mirror/action.yml
@@ -0,0 +1,91 @@
+name: 'Configure apt mirror'
+description: |
+  Reconfigure the GitHub Actions runner's Ubuntu apt sources to use an
+  alternate mirror, and emit the effective URLs as outputs so callers can
+  forward them as Docker build-args.
+
+  Two mirror profiles depending on where the runner lives, because the
+  best mirror differs by network:
+
+    * github-hosted runners run on Azure, so they default to the
+      Azure-hosted Ubuntu mirror (lowest latency, same VPC).
+    * self-hosted runners (arc-runner-set, bigger-runner, ...) typically
+      cannot route to azure.archive.ubuntu.com, so they default to the
+      kernel.org mirror, which is publicly reachable from anywhere.
+
+  Pass an empty string to either input to skip the rewrite for that
+  profile and keep upstream archive.ubuntu.com / ports.ubuntu.com.
+
+inputs:
+  github-hosted-mirror:
+    description: 'archive/security mirror URL for github-hosted runners (empty = upstream)'
+    required: false
+    default: 'http://azure.archive.ubuntu.com'
+  github-hosted-ports-mirror:
+    description: 'ports.ubuntu.com mirror URL for github-hosted runners (empty = upstream)'
+    required: false
+    default: 'http://azure.ports.ubuntu.com'
+  self-hosted-mirror:
+    description: 'archive/security mirror URL for self-hosted runners (empty = upstream)'
+    required: false
+    default: 'https://mirrors.edge.kernel.org'
+  self-hosted-ports-mirror:
+    description: 'ports.ubuntu.com mirror URL for self-hosted runners (empty = upstream)'
+    required: false
+    default: 'https://mirrors.edge.kernel.org'
+
+outputs:
+  effective-mirror:
+    description: 'The mirror URL actually applied for this runner (or empty)'
+    value: ${{ steps.pick.outputs.mirror }}
+  effective-ports-mirror:
+    description: 'The ports mirror URL actually applied for this runner (or empty)'
+    value: ${{ steps.pick.outputs.ports-mirror }}
+
+runs:
+  using: 'composite'
+  steps:
+    - name: Pick effective mirror for this runner
+      id: pick
+      shell: bash
+      env:
+        RUNNER_ENV: ${{ runner.environment }}
+        GH_MIRROR: ${{ inputs.github-hosted-mirror }}
+        GH_PORTS_MIRROR: ${{ inputs.github-hosted-ports-mirror }}
+        SH_MIRROR: ${{ inputs.self-hosted-mirror }}
+        SH_PORTS_MIRROR: ${{ inputs.self-hosted-ports-mirror }}
+      run: |
+        if [ "${RUNNER_ENV}" = "github-hosted" ]; then
+          MIRROR="${GH_MIRROR}"
+          PORTS_MIRROR="${GH_PORTS_MIRROR}"
+        else
+          MIRROR="${SH_MIRROR}"
+          PORTS_MIRROR="${SH_PORTS_MIRROR}"
+        fi
+        echo "configure-apt-mirror: runner=${RUNNER_ENV} mirror='${MIRROR}' ports-mirror='${PORTS_MIRROR}'"
+        echo "mirror=${MIRROR}" >> "$GITHUB_OUTPUT"
+        echo "ports-mirror=${PORTS_MIRROR}" >> "$GITHUB_OUTPUT"
+
+    - name: Rewrite apt sources
+      if: steps.pick.outputs.mirror != '' || steps.pick.outputs.ports-mirror != ''
+      shell: bash
+      env:
+        APT_MIRROR: ${{ steps.pick.outputs.mirror }}
+        APT_PORTS_MIRROR: ${{ steps.pick.outputs.ports-mirror }}
+      run: |
+        set -e
+        # Ubuntu 24.04 (noble) ships DEB822 sources at
+        # /etc/apt/sources.list.d/ubuntu.sources; older releases use
+        # /etc/apt/sources.list. Rewrite whichever exists.
+        for f in /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list; do
+          sudo test -f "$f" || continue
+          if [ -n "${APT_MIRROR}" ]; then
+            # Comma delimiter so the alternation pipe in the regex is not
+            # interpreted as the s/// separator.
+            sudo sed -i -E "s,https?://(archive\.ubuntu\.com|security\.ubuntu\.com),${APT_MIRROR},g" "$f"
+          fi
+          if [ -n "${APT_PORTS_MIRROR}" ]; then
+            sudo sed -i -E "s,https?://ports\.ubuntu\.com,${APT_PORTS_MIRROR},g" "$f"
+          fi
+        done
+        echo "Runner apt mirror configured (APT_MIRROR='${APT_MIRROR}', APT_PORTS_MIRROR='${APT_PORTS_MIRROR}')"
--- a/.github/bump_deps.sh
+++ b/.github/bump_deps.sh
@@ -3,15 +3,20 @@ set -xe
 REPO=$1
 BRANCH=$2
 VAR=$3
+FILE=$4
+
+if [ -z "$FILE" ]; then
+    FILE="Makefile"
+fi

 LAST_COMMIT=$(curl -s -H "Accept: application/vnd.github.VERSION.sha" "https://api.github.com/repos/$REPO/commits/$BRANCH")

 # Read $VAR from Makefile (only first match)
 set +e
-CURRENT_COMMIT="$(grep -m1 "^$VAR?=" Makefile | cut -d'=' -f2)"
+CURRENT_COMMIT="$(grep -m1 "^$VAR?=" $FILE | cut -d'=' -f2)"
 set -e

-sed -i Makefile -e "s/$VAR?=.*/$VAR?=$LAST_COMMIT/"
+sed -i $FILE -e "s/$VAR?=.*/$VAR?=$LAST_COMMIT/"

 if [ -z "$CURRENT_COMMIT" ]; then
    echo "Could not find $VAR in Makefile."
--- a/.github/bump_vllm_wheel.sh
+++ b/.github/bump_vllm_wheel.sh
@@ -0,0 +1,45 @@
+#!/bin/bash
+# Bump the cublas13 vLLM wheel pin in requirements-cublas13-after.txt.
+#
+# vLLM's PyPI wheel is built against CUDA 12 so the cublas13 build pulls a
+# cu130-flavoured wheel from vLLM's per-tag index at
+# https://wheels.vllm.ai/<TAG>/cu130/. That URL segment is itself version-locked
+# (no /latest/ alias upstream), so bumping vLLM means rewriting both the URL
+# segment and the version constraint atomically. bump_deps.sh handles git-sha
+# vars in Makefiles; this script handles the two-value rewrite specific to the
+# vLLM requirements file.
+set -xe
+REPO=$1   # vllm-project/vllm
+FILE=$2   # backend/python/vllm/requirements-cublas13-after.txt
+VAR=$3    # VLLM_VERSION (used for output file names so the workflow can read them)
+
+if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then
+    echo "usage: $0 <repo> <requirements-file> <var-name>" >&2
+    exit 1
+fi
+
+# /releases/latest returns the most recent non-prerelease tag.
+LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \
+    "https://api.github.com/repos/$REPO/releases/latest" \
+    | python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])")
+
+# Strip leading 'v' (vLLM tags are 'v0.20.0', the URL/version use '0.20.0').
+NEW_VERSION="${LATEST_TAG#v}"
+
+set +e
+CURRENT_VERSION=$(grep -oE '^vllm==[0-9]+\.[0-9]+\.[0-9]+' "$FILE" | head -1 | cut -d= -f3)
+set -e
+
+# sed both lines unconditionally — peter-evans/create-pull-request opens no PR
+# when the working tree is clean, so a no-op rewrite is safe.
+sed -i "$FILE" \
+    -e "s|wheels\.vllm\.ai/[^/]*/cu130|wheels.vllm.ai/$NEW_VERSION/cu130|g" \
+    -e "s|^vllm==.*|vllm==$NEW_VERSION|"
+
+if [ -z "$CURRENT_VERSION" ]; then
+    echo "Could not find vllm==X.Y.Z in $FILE."
+    exit 0
+fi
+
+echo "Changes: https://github.com/$REPO/compare/v${CURRENT_VERSION}...${LATEST_TAG}" >> "${VAR}_message.txt"
+echo "${NEW_VERSION}" >> "${VAR}_commit.txt"
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -29,10 +29,6 @@ updates:
    schedule:
      # Check for updates to GitHub Actions every weekday
      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/autogptq"
-    schedule:
-      interval: "weekly"
  - package-ecosystem: "pip"
    directory: "/backend/python/bark"
    schedule:
@@ -65,10 +61,6 @@ updates:
    directory: "/backend/python/openvoice"
    schedule:
      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/parler-tts"
-    schedule:
-      interval: "weekly"
  - package-ecosystem: "pip"
    directory: "/backend/python/rerankers"
    schedule:
--- a/.github/gallery-agent/gallery.go
+++ b/.github/gallery-agent/gallery.go
@@ -0,0 +1,213 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os"
+	"strings"
+
+	"github.com/mudler/LocalAI/core/gallery/importers"
+	"sigs.k8s.io/yaml"
+)
+
+func formatTextContent(text string) string {
+	return formatTextContentWithIndent(text, 4, 6)
+}
+
+// formatTextContentWithIndent formats text content with specified base and list item indentation
+func formatTextContentWithIndent(text string, baseIndent int, listItemIndent int) string {
+	var formattedLines []string
+	lines := strings.Split(text, "\n")
+	for _, line := range lines {
+		trimmed := strings.TrimRight(line, " \t\r")
+		if trimmed == "" {
+			// Keep empty lines as empty (no indentation)
+			formattedLines = append(formattedLines, "")
+		} else {
+			// Preserve relative indentation from yaml.Marshal output
+			// Count existing leading spaces to preserve relative structure
+			leadingSpaces := len(trimmed) - len(strings.TrimLeft(trimmed, " \t"))
+			trimmedStripped := strings.TrimLeft(trimmed, " \t")
+
+			var totalIndent int
+			if strings.HasPrefix(trimmedStripped, "-") {
+				// List items: use listItemIndent (ignore existing leading spaces)
+				totalIndent = listItemIndent
+			} else {
+				// Regular lines: use baseIndent + preserve relative indentation
+				// This handles both top-level keys (leadingSpaces=0) and nested properties (leadingSpaces>0)
+				totalIndent = baseIndent + leadingSpaces
+			}
+
+			indentStr := strings.Repeat(" ", totalIndent)
+			formattedLines = append(formattedLines, indentStr+trimmedStripped)
+		}
+	}
+	formattedText := strings.Join(formattedLines, "\n")
+	// Remove any trailing spaces from the formatted description
+	formattedText = strings.TrimRight(formattedText, " \t")
+	return formattedText
+}
+
+// generateYAMLEntry generates a YAML entry for a model using the specified anchor
+func generateYAMLEntry(model ProcessedModel, quantization string) string {
+	modelConfig, err := importers.DiscoverModelConfig("https://huggingface.co/"+model.ModelID, json.RawMessage(`{ "quantization": "`+quantization+`"}`))
+	if err != nil {
+		panic(err)
+	}
+
+	// Extract model name from ModelID
+	parts := strings.Split(model.ModelID, "/")
+	modelName := model.ModelID
+	if len(parts) > 0 {
+		modelName = strings.ToLower(parts[len(parts)-1])
+	}
+	// Remove common suffixes
+	modelName = strings.ReplaceAll(modelName, "-gguf", "")
+	modelName = strings.ReplaceAll(modelName, "-q4_k_m", "")
+	modelName = strings.ReplaceAll(modelName, "-q4_k_s", "")
+	modelName = strings.ReplaceAll(modelName, "-q3_k_m", "")
+	modelName = strings.ReplaceAll(modelName, "-q2_k", "")
+
+	description := model.ReadmeContent
+	if description == "" {
+		description = fmt.Sprintf("AI model: %s", modelName)
+	}
+
+	// Clean up description to prevent YAML linting issues
+	description = cleanTextContent(description)
+	formattedDescription := formatTextContent(description)
+
+	// Strip name and description from config file since they are
+	// already present at the gallery entry level and should not
+	// appear under overrides.
+	configFileContent := modelConfig.ConfigFile
+	var cfgMap map[string]any
+	if err := yaml.Unmarshal([]byte(configFileContent), &cfgMap); err == nil {
+		delete(cfgMap, "name")
+		delete(cfgMap, "description")
+		if cleaned, err := yaml.Marshal(cfgMap); err == nil {
+			configFileContent = string(cleaned)
+		}
+	}
+
+	configFile := formatTextContent(configFileContent)
+
+	filesYAML, _ := yaml.Marshal(modelConfig.Files)
+
+	// Files section: list items need 4 spaces (not 6), since files: is at 2 spaces
+	files := formatTextContentWithIndent(string(filesYAML), 4, 4)
+
+	// Build metadata sections
+	var metadataSections []string
+
+	// Add license if present
+	if model.License != "" {
+		metadataSections = append(metadataSections, fmt.Sprintf(`  license: "%s"`, model.License))
+	}
+
+	// Add tags if present
+	if len(model.Tags) > 0 {
+		tagsYAML, _ := yaml.Marshal(model.Tags)
+		tagsFormatted := formatTextContentWithIndent(string(tagsYAML), 4, 4)
+		tagsFormatted = strings.TrimRight(tagsFormatted, "\n")
+		metadataSections = append(metadataSections, fmt.Sprintf("  tags:\n%s", tagsFormatted))
+	}
+
+	// Add icon if present
+	if model.Icon != "" {
+		metadataSections = append(metadataSections, fmt.Sprintf(`  icon: %s`, model.Icon))
+	}
+
+	// Build the metadata block
+	metadataBlock := ""
+	if len(metadataSections) > 0 {
+		metadataBlock = strings.Join(metadataSections, "\n") + "\n"
+	}
+
+	yamlTemplate := ""
+	yamlTemplate = `- name: "%s"
+  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
+  urls:
+    - https://huggingface.co/%s
+  description: |
+%s%s
+  overrides:
+%s
+  files:
+%s`
+	// Trim trailing newlines from formatted sections to prevent extra blank lines
+	formattedDescription = strings.TrimRight(formattedDescription, "\n")
+	configFile = strings.TrimRight(configFile, "\n")
+	files = strings.TrimRight(files, "\n")
+	// Add newline before metadata block if present
+	if metadataBlock != "" {
+		metadataBlock = "\n" + strings.TrimRight(metadataBlock, "\n")
+	}
+	return fmt.Sprintf(yamlTemplate,
+		modelName,
+		model.ModelID,
+		formattedDescription,
+		metadataBlock,
+		configFile,
+		files,
+	)
+}
+
+// generateYAMLForModels generates YAML entries for selected models and appends to index.yaml
+func generateYAMLForModels(ctx context.Context, models []ProcessedModel, quantization string) error {
+
+	// Generate YAML entries for each model
+	var yamlEntries []string
+	for _, model := range models {
+		fmt.Printf("Generating YAML entry for model: %s\n", model.ModelID)
+
+		// Generate YAML entry
+		yamlEntry := generateYAMLEntry(model, quantization)
+		yamlEntries = append(yamlEntries, yamlEntry)
+	}
+
+	// Prepend to index.yaml (write at the top)
+	if len(yamlEntries) > 0 {
+		indexPath := getGalleryIndexPath()
+		fmt.Printf("Prepending YAML entries to %s...\n", indexPath)
+
+		// Read current content
+		content, err := os.ReadFile(indexPath)
+		if err != nil {
+			return fmt.Errorf("failed to read %s: %w", indexPath, err)
+		}
+
+		existingContent := string(content)
+		yamlBlock := strings.Join(yamlEntries, "\n")
+
+		// Check if file starts with "---"
+		var newContent string
+		if strings.HasPrefix(existingContent, "---\n") {
+			// File starts with "---", prepend new entries after it
+			restOfContent := strings.TrimPrefix(existingContent, "---\n")
+			// Ensure proper spacing: "---\n" + new entries + "\n" + rest of content
+			newContent = "---\n" + yamlBlock + "\n" + restOfContent
+		} else if strings.HasPrefix(existingContent, "---") {
+			// File starts with "---" but no newline after
+			restOfContent := strings.TrimPrefix(existingContent, "---")
+			newContent = "---\n" + yamlBlock + "\n" + strings.TrimPrefix(restOfContent, "\n")
+		} else {
+			// No "---" at start, prepend new entries at the very beginning
+			// Trim leading whitespace from existing content
+			existingContent = strings.TrimLeft(existingContent, " \t\n\r")
+			newContent = yamlBlock + "\n" + existingContent
+		}
+
+		// Write back to file
+		err = os.WriteFile(indexPath, []byte(newContent), 0644)
+		if err != nil {
+			return fmt.Errorf("failed to write %s: %w", indexPath, err)
+		}
+
+		fmt.Printf("Successfully prepended %d models to %s\n", len(yamlEntries), indexPath)
+	}
+
+	return nil
+}
--- a/.github/gallery-agent/helpers.go
+++ b/.github/gallery-agent/helpers.go
@@ -0,0 +1,301 @@
+package main
+
+import (
+	"encoding/json"
+	"fmt"
+	"io"
+	"net/http"
+	"os"
+	"regexp"
+	"strings"
+
+	hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
+	"sigs.k8s.io/yaml"
+)
+
+var galleryIndexPath = os.Getenv("GALLERY_INDEX_PATH")
+
+// getGalleryIndexPath returns the gallery index file path, with a default fallback
+func getGalleryIndexPath() string {
+	if galleryIndexPath != "" {
+		return galleryIndexPath
+	}
+	return "gallery/index.yaml"
+}
+
+type galleryModel struct {
+	Name string   `yaml:"name"`
+	Urls []string `yaml:"urls"`
+}
+
+// loadGalleryURLSet parses gallery/index.yaml once and returns the set of
+// HuggingFace model URLs already present in the gallery.
+func loadGalleryURLSet() (map[string]struct{}, error) {
+	indexPath := getGalleryIndexPath()
+	content, err := os.ReadFile(indexPath)
+	if err != nil {
+		return nil, fmt.Errorf("failed to read %s: %w", indexPath, err)
+	}
+
+	var galleryModels []galleryModel
+	if err := yaml.Unmarshal(content, &galleryModels); err != nil {
+		return nil, fmt.Errorf("failed to unmarshal %s: %w", indexPath, err)
+	}
+
+	set := make(map[string]struct{}, len(galleryModels))
+	for _, gm := range galleryModels {
+		for _, u := range gm.Urls {
+			set[u] = struct{}{}
+		}
+	}
+
+	// Also skip URLs already proposed in open (unmerged) gallery-agent PRs.
+	// The workflow injects these via EXTRA_SKIP_URLS so we don't keep
+	// re-proposing the same model every run while a PR is waiting to merge.
+	for _, line := range strings.FieldsFunc(os.Getenv("EXTRA_SKIP_URLS"), func(r rune) bool {
+		return r == '\n' || r == ',' || r == ' '
+	}) {
+		u := strings.TrimSpace(line)
+		if u != "" {
+			set[u] = struct{}{}
+		}
+	}
+
+	return set, nil
+}
+
+// modelAlreadyInGallery checks whether a HuggingFace model repo is already
+// referenced in the gallery URL set.
+func modelAlreadyInGallery(set map[string]struct{}, modelID string) bool {
+	_, ok := set["https://huggingface.co/"+modelID]
+	return ok
+}
+
+// baseModelFromTags returns the first `base_model:<repo>` value found in the
+// tag list, or "" if none is present. HuggingFace surfaces the base model
+// declared in the model card's YAML frontmatter as such a tag.
+func baseModelFromTags(tags []string) string {
+	for _, t := range tags {
+		if strings.HasPrefix(t, "base_model:") {
+			return strings.TrimPrefix(t, "base_model:")
+		}
+	}
+	return ""
+}
+
+// licenseFromTags returns the `license:<id>` value from the tag list, or "".
+func licenseFromTags(tags []string) string {
+	for _, t := range tags {
+		if strings.HasPrefix(t, "license:") {
+			return strings.TrimPrefix(t, "license:")
+		}
+	}
+	return ""
+}
+
+// curatedTags produces the gallery tag list from HuggingFace's raw tag set.
+// Always includes llm + gguf, then adds whitelisted family / capability
+// markers when they appear in the HF tag list.
+func curatedTags(hfTags []string) []string {
+	whitelist := []string{
+		"gpu", "cpu",
+		"llama", "mistral", "mixtral", "qwen", "qwen2", "qwen3",
+		"gemma", "gemma2", "gemma3", "phi", "phi3", "phi4",
+		"deepseek", "yi", "falcon", "command-r",
+		"vision", "multimodal", "code", "chat",
+		"instruction-tuned", "reasoning", "thinking",
+	}
+	seen := map[string]struct{}{}
+	out := []string{"llm", "gguf"}
+	seen["llm"] = struct{}{}
+	seen["gguf"] = struct{}{}
+
+	hfSet := map[string]struct{}{}
+	for _, t := range hfTags {
+		hfSet[strings.ToLower(t)] = struct{}{}
+	}
+	for _, w := range whitelist {
+		if _, ok := hfSet[w]; ok {
+			if _, dup := seen[w]; !dup {
+				out = append(out, w)
+				seen[w] = struct{}{}
+			}
+		}
+	}
+	return out
+}
+
+// resolveReadme fetches a description-quality README for a (possibly
+// quantized) repo: if a `base_model:` tag is present, fetch the base repo's
+// README; otherwise fall back to the repo's own README.
+func resolveReadme(client *hfapi.Client, modelID string, hfTags []string) (string, error) {
+	if base := baseModelFromTags(hfTags); base != "" && base != modelID {
+		if content, err := client.GetReadmeContent(base, "README.md"); err == nil && strings.TrimSpace(content) != "" {
+			return cleanTextContent(content), nil
+		}
+	}
+	content, err := client.GetReadmeContent(modelID, "README.md")
+	if err != nil {
+		return "", err
+	}
+	return cleanTextContent(content), nil
+}
+
+// extractDescription turns a raw HuggingFace README into a concise plain-text
+// description suitable for embedding in gallery/index.yaml: strips YAML
+// frontmatter, HTML tags/comments, markdown images, link URLs (keeping the
+// link text), markdown tables, and then truncates at a paragraph boundary
+// around ~1200 characters. Raw README should still be used for icon
+// extraction — call this only for the `description:` field.
+func extractDescription(readme string) string {
+	s := readme
+
+	// Strip leading YAML frontmatter: `---\n...\n---\n` at start of file.
+	if strings.HasPrefix(strings.TrimLeft(s, " \t\n"), "---") {
+		trimmed := strings.TrimLeft(s, " \t\n")
+		rest := strings.TrimPrefix(trimmed, "---")
+		if idx := strings.Index(rest, "\n---"); idx >= 0 {
+			after := rest[idx+len("\n---"):]
+			after = strings.TrimPrefix(after, "\n")
+			s = after
+		}
+	}
+
+	// Strip HTML comments and tags.
+	s = regexp.MustCompile(`(?s)<!--.*?-->`).ReplaceAllString(s, "")
+	s = regexp.MustCompile(`(?is)<[^>]+>`).ReplaceAllString(s, "")
+
+	// Strip markdown images entirely.
+	s = regexp.MustCompile(`!\[[^\]]*\]\([^)]*\)`).ReplaceAllString(s, "")
+	// Replace markdown links `[text](url)` with just `text`.
+	s = regexp.MustCompile(`\[([^\]]+)\]\([^)]+\)`).ReplaceAllString(s, "$1")
+
+	// Drop table lines and horizontal rules, and flatten all leading
+	// whitespace: generateYAMLEntry embeds this under a `description: |`
+	// literal block whose indentation is set by the first non-empty line.
+	// If any line has extra leading whitespace (e.g. from an indented
+	// `<p align="center">` block in the original README), YAML will pick
+	// that up as the block's indent and every later line at a smaller
+	// indent blows the block scalar. Stripping leading whitespace here
+	// guarantees uniform 4-space indentation after formatTextContent runs.
+	var kept []string
+	for _, line := range strings.Split(s, "\n") {
+		t := strings.TrimLeft(line, " \t")
+		ts := strings.TrimSpace(t)
+		if strings.HasPrefix(ts, "|") {
+			continue
+		}
+		if strings.HasPrefix(ts, ":--") || strings.HasPrefix(ts, "---") || strings.HasPrefix(ts, "===") {
+			continue
+		}
+		kept = append(kept, t)
+	}
+	s = strings.Join(kept, "\n")
+
+	// Normalise whitespace and drop any leading blank lines so the literal
+	// block in YAML doesn't start with a blank first line (which would
+	// break the indentation detector the same way).
+	s = cleanTextContent(s)
+	s = strings.TrimLeft(s, " \t\n")
+
+	// Truncate at a paragraph boundary around maxLen chars.
+	const maxLen = 1200
+	if len(s) > maxLen {
+		cut := strings.LastIndex(s[:maxLen], "\n\n")
+		if cut < maxLen/3 {
+			cut = maxLen
+		}
+		s = strings.TrimRight(s[:cut], " \t\n") + "\n\n..."
+	}
+
+	return s
+}
+
+// cleanTextContent removes trailing spaces/tabs and collapses multiple empty
+// lines so README content embeds cleanly into YAML without lint noise.
+func cleanTextContent(text string) string {
+	lines := strings.Split(text, "\n")
+	var cleaned []string
+	var prevEmpty bool
+	for _, line := range lines {
+		trimmed := strings.TrimRight(line, " \t\r")
+		if trimmed == "" {
+			if !prevEmpty {
+				cleaned = append(cleaned, "")
+			}
+			prevEmpty = true
+		} else {
+			cleaned = append(cleaned, trimmed)
+			prevEmpty = false
+		}
+	}
+	return strings.TrimRight(strings.Join(cleaned, "\n"), "\n")
+}
+
+// extractIconFromReadme scans README content for an image URL usable as a
+// gallery entry icon.
+func extractIconFromReadme(readmeContent string) string {
+	if readmeContent == "" {
+		return ""
+	}
+
+	markdownImageRegex := regexp.MustCompile(`(?i)!\[[^\]]*\]\(([^)]+\.(png|jpg|jpeg|svg|webp|gif))\)`)
+	htmlImageRegex := regexp.MustCompile(`(?i)<img[^>]+src=["']([^"']+\.(png|jpg|jpeg|svg|webp|gif))["']`)
+	plainImageRegex := regexp.MustCompile(`(?i)https?://[^\s<>"']+\.(png|jpg|jpeg|svg|webp|gif)`)
+
+	if m := markdownImageRegex.FindStringSubmatch(readmeContent); len(m) > 1 && strings.HasPrefix(strings.ToLower(m[1]), "http") {
+		return strings.TrimSpace(m[1])
+	}
+	if m := htmlImageRegex.FindStringSubmatch(readmeContent); len(m) > 1 && strings.HasPrefix(strings.ToLower(m[1]), "http") {
+		return strings.TrimSpace(m[1])
+	}
+	if m := plainImageRegex.FindStringSubmatch(readmeContent); len(m) > 0 && strings.HasPrefix(strings.ToLower(m[0]), "http") {
+		return strings.TrimSpace(m[0])
+	}
+	return ""
+}
+
+// getHuggingFaceAvatarURL returns the HF avatar URL for a user, or "".
+func getHuggingFaceAvatarURL(author string) string {
+	if author == "" {
+		return ""
+	}
+	userURL := fmt.Sprintf("https://huggingface.co/api/users/%s/overview", author)
+	resp, err := http.Get(userURL)
+	if err != nil {
+		return ""
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK {
+		return ""
+	}
+	body, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return ""
+	}
+	var info map[string]any
+	if err := json.Unmarshal(body, &info); err != nil {
+		return ""
+	}
+	if v, ok := info["avatarUrl"].(string); ok && v != "" {
+		return v
+	}
+	if v, ok := info["avatar"].(string); ok && v != "" {
+		return v
+	}
+	return ""
+}
+
+// extractModelIcon extracts an icon URL from the README, falling back to the
+// HuggingFace user avatar.
+func extractModelIcon(model ProcessedModel) string {
+	if icon := extractIconFromReadme(model.ReadmeContent); icon != "" {
+		return icon
+	}
+	if model.Author != "" {
+		if avatar := getHuggingFaceAvatarURL(model.Author); avatar != "" {
+			return avatar
+		}
+	}
+	return ""
+}
--- a/.github/gallery-agent/main.go
+++ b/.github/gallery-agent/main.go
@@ -0,0 +1,280 @@
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"os"
+	"strconv"
+	"time"
+
+	hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
+)
+
+// ProcessedModelFile represents a processed model file with additional metadata
+type ProcessedModelFile struct {
+	Path     string `json:"path"`
+	Size     int64  `json:"size"`
+	SHA256   string `json:"sha256"`
+	IsReadme bool   `json:"is_readme"`
+	FileType string `json:"file_type"` // "model", "readme", "other"
+}
+
+// ProcessedModel represents a processed model with all gathered metadata
+type ProcessedModel struct {
+	ModelID                 string               `json:"model_id"`
+	Author                  string               `json:"author"`
+	Downloads               int                  `json:"downloads"`
+	LastModified            string               `json:"last_modified"`
+	Files                   []ProcessedModelFile `json:"files"`
+	PreferredModelFile      *ProcessedModelFile  `json:"preferred_model_file,omitempty"`
+	ReadmeFile              *ProcessedModelFile  `json:"readme_file,omitempty"`
+	ReadmeContent           string               `json:"readme_content,omitempty"`
+	ReadmeContentPreview    string               `json:"readme_content_preview,omitempty"`
+	QuantizationPreferences []string             `json:"quantization_preferences"`
+	ProcessingError         string               `json:"processing_error,omitempty"`
+	Tags                    []string             `json:"tags,omitempty"`
+	License                 string               `json:"license,omitempty"`
+	Icon                    string               `json:"icon,omitempty"`
+}
+
+// AddedModelSummary represents a summary of models added to the gallery
+type AddedModelSummary struct {
+	SearchTerm     string   `json:"search_term"`
+	TotalFound     int      `json:"total_found"`
+	ModelsAdded    int      `json:"models_added"`
+	AddedModelIDs  []string `json:"added_model_ids"`
+	AddedModelURLs []string `json:"added_model_urls"`
+	Quantization   string   `json:"quantization"`
+	ProcessingTime string   `json:"processing_time"`
+}
+
+func main() {
+	startTime := time.Now()
+
+	// Synthetic mode for local testing
+	if sm := os.Getenv("SYNTHETIC_MODE"); sm == "true" || sm == "1" {
+		fmt.Println("Running in SYNTHETIC MODE - generating random test data")
+		if err := runSyntheticMode(); err != nil {
+			fmt.Fprintf(os.Stderr, "Error in synthetic mode: %v\n", err)
+			os.Exit(1)
+		}
+		return
+	}
+
+	searchTerm := os.Getenv("SEARCH_TERM")
+	if searchTerm == "" {
+		searchTerm = "GGUF"
+	}
+
+	limitStr := os.Getenv("LIMIT")
+	if limitStr == "" {
+		limitStr = "15"
+	}
+	limit, err := strconv.Atoi(limitStr)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error parsing LIMIT: %v\n", err)
+		os.Exit(1)
+	}
+
+	quantization := os.Getenv("QUANTIZATION")
+	if quantization == "" {
+		quantization = "Q4_K_M"
+	}
+
+	maxModelsStr := os.Getenv("MAX_MODELS")
+	if maxModelsStr == "" {
+		maxModelsStr = "1"
+	}
+	maxModels, err := strconv.Atoi(maxModelsStr)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error parsing MAX_MODELS: %v\n", err)
+		os.Exit(1)
+	}
+
+	fmt.Printf("Gallery Agent Configuration:\n")
+	fmt.Printf("  Search Term: %s\n", searchTerm)
+	fmt.Printf("  Limit: %d\n", limit)
+	fmt.Printf("  Quantization: %s\n", quantization)
+	fmt.Printf("  Max Models to Add: %d\n", maxModels)
+	fmt.Printf("  Gallery Index Path: %s\n", getGalleryIndexPath())
+	fmt.Println()
+
+	// Phase 1: load current gallery and query HuggingFace.
+	gallerySet, err := loadGalleryURLSet()
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error loading gallery index: %v\n", err)
+		os.Exit(1)
+	}
+	fmt.Printf("Loaded %d existing gallery entries\n", len(gallerySet))
+
+	client := hfapi.NewClient()
+
+	fmt.Println("Searching for trending models on HuggingFace...")
+	rawModels, err := client.GetTrending(searchTerm, limit)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error fetching models: %v\n", err)
+		os.Exit(1)
+	}
+	fmt.Printf("Found %d trending models matching %q\n", len(rawModels), searchTerm)
+	totalFound := len(rawModels)
+
+	// Phase 2: drop anything already in the gallery *before* any expensive
+	// per-model work (GetModelDetails, README fetches, icon lookups).
+	fresh := rawModels[:0]
+	for _, m := range rawModels {
+		if modelAlreadyInGallery(gallerySet, m.ModelID) {
+			fmt.Printf("Skipping existing model: %s\n", m.ModelID)
+			continue
+		}
+		fresh = append(fresh, m)
+	}
+	fmt.Printf("%d candidates after gallery dedup\n", len(fresh))
+
+	// Phase 3: HuggingFace already returned these in trendingScore order —
+	// just cap to MAX_MODELS.
+	if len(fresh) > maxModels {
+		fresh = fresh[:maxModels]
+	}
+	if len(fresh) == 0 {
+		fmt.Println("No new models to add to the gallery.")
+		writeSummary(AddedModelSummary{
+			SearchTerm:     searchTerm,
+			TotalFound:     totalFound,
+			ModelsAdded:    0,
+			Quantization:   quantization,
+			ProcessingTime: time.Since(startTime).String(),
+		})
+		return
+	}
+
+	// Phase 4: fetch details and build ProcessedModel entries for survivors.
+	var processed []ProcessedModel
+	quantPrefs := []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K", "Q8_0"}
+	for _, m := range fresh {
+		fmt.Printf("Processing model: %s (downloads=%d)\n", m.ModelID, m.Downloads)
+
+		pm := ProcessedModel{
+			ModelID:                 m.ModelID,
+			Author:                  m.Author,
+			Downloads:               m.Downloads,
+			LastModified:            m.LastModified,
+			QuantizationPreferences: quantPrefs,
+		}
+
+		details, err := client.GetModelDetails(m.ModelID)
+		if err != nil {
+			fmt.Printf("  Error getting model details: %v (skipping)\n", err)
+			continue
+		}
+
+		preferred := hfapi.FindPreferredModelFile(details.Files, quantPrefs)
+		if preferred == nil {
+			fmt.Printf("  No GGUF file matching %v — skipping\n", quantPrefs)
+			continue
+		}
+
+		pm.Files = make([]ProcessedModelFile, len(details.Files))
+		for j, f := range details.Files {
+			fileType := "other"
+			if f.IsReadme {
+				fileType = "readme"
+			} else if f.Path == preferred.Path {
+				fileType = "model"
+			}
+			pm.Files[j] = ProcessedModelFile{
+				Path:     f.Path,
+				Size:     f.Size,
+				SHA256:   f.SHA256,
+				IsReadme: f.IsReadme,
+				FileType: fileType,
+			}
+			if f.Path == preferred.Path {
+				copyFile := pm.Files[j]
+				pm.PreferredModelFile = &copyFile
+			}
+			if f.IsReadme {
+				copyFile := pm.Files[j]
+				pm.ReadmeFile = &copyFile
+			}
+		}
+
+		// Deterministic README resolution: follow base_model tag if set.
+		// Keep the raw (HTML-bearing) README around while we extract the
+		// icon, then strip it down to a plain-text description for the
+		// `description:` YAML field.
+		readme, err := resolveReadme(client, m.ModelID, m.Tags)
+		if err != nil {
+			fmt.Printf("  Warning: failed to fetch README: %v\n", err)
+		}
+		pm.ReadmeContent = readme
+
+		pm.License = licenseFromTags(m.Tags)
+		pm.Tags = curatedTags(m.Tags)
+		pm.Icon = extractModelIcon(pm)
+
+		if pm.ReadmeContent != "" {
+			pm.ReadmeContent = extractDescription(pm.ReadmeContent)
+			pm.ReadmeContentPreview = truncateString(pm.ReadmeContent, 200)
+		}
+
+		fmt.Printf("  License: %s, Tags: %v, Icon: %s\n", pm.License, pm.Tags, pm.Icon)
+		processed = append(processed, pm)
+	}
+
+	if len(processed) == 0 {
+		fmt.Println("No processable models after detail fetch.")
+		writeSummary(AddedModelSummary{
+			SearchTerm:     searchTerm,
+			TotalFound:     totalFound,
+			ModelsAdded:    0,
+			Quantization:   quantization,
+			ProcessingTime: time.Since(startTime).String(),
+		})
+		return
+	}
+
+	// Phase 5: write YAML entries.
+	var addedIDs, addedURLs []string
+	for _, pm := range processed {
+		addedIDs = append(addedIDs, pm.ModelID)
+		addedURLs = append(addedURLs, "https://huggingface.co/"+pm.ModelID)
+	}
+
+	fmt.Println("Generating YAML entries for selected models...")
+	if err := generateYAMLForModels(context.Background(), processed, quantization); err != nil {
+		fmt.Fprintf(os.Stderr, "Error generating YAML entries: %v\n", err)
+		os.Exit(1)
+	}
+
+	writeSummary(AddedModelSummary{
+		SearchTerm:     searchTerm,
+		TotalFound:     totalFound,
+		ModelsAdded:    len(addedIDs),
+		AddedModelIDs:  addedIDs,
+		AddedModelURLs: addedURLs,
+		Quantization:   quantization,
+		ProcessingTime: time.Since(startTime).String(),
+	})
+}
+
+func writeSummary(summary AddedModelSummary) {
+	data, err := json.MarshalIndent(summary, "", "  ")
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error marshaling summary: %v\n", err)
+		return
+	}
+	if err := os.WriteFile("gallery-agent-summary.json", data, 0644); err != nil {
+		fmt.Fprintf(os.Stderr, "Error writing summary file: %v\n", err)
+		return
+	}
+	fmt.Println("Summary written to gallery-agent-summary.json")
+}
+
+func truncateString(s string, maxLen int) string {
+	if len(s) <= maxLen {
+		return s
+	}
+	return s[:maxLen] + "..."
+}
+
--- a/.github/gallery-agent/testing.go
+++ b/.github/gallery-agent/testing.go
@@ -0,0 +1,224 @@
+package main
+
+import (
+	"context"
+	"fmt"
+	"math/rand/v2"
+	"strings"
+	"time"
+)
+
+// runSyntheticMode generates synthetic test data and appends it to the gallery
+func runSyntheticMode() error {
+	generator := NewSyntheticDataGenerator()
+
+	// Generate a random number of synthetic models (1-3)
+	numModels := generator.rand.IntN(3) + 1
+	fmt.Printf("Generating %d synthetic models for testing...\n", numModels)
+
+	var models []ProcessedModel
+	for range numModels {
+		model := generator.GenerateProcessedModel()
+		models = append(models, model)
+		fmt.Printf("Generated synthetic model: %s\n", model.ModelID)
+	}
+
+	// Generate YAML entries and append to gallery/index.yaml
+	fmt.Println("Generating YAML entries for synthetic models...")
+	err := generateYAMLForModels(context.Background(), models, "Q4_K_M")
+	if err != nil {
+		return fmt.Errorf("error generating YAML entries: %w", err)
+	}
+
+	fmt.Printf("Successfully added %d synthetic models to the gallery for testing!\n", len(models))
+	return nil
+}
+
+// SyntheticDataGenerator provides methods to generate synthetic test data
+type SyntheticDataGenerator struct {
+	rand *rand.Rand
+}
+
+// NewSyntheticDataGenerator creates a new synthetic data generator
+func NewSyntheticDataGenerator() *SyntheticDataGenerator {
+	return &SyntheticDataGenerator{
+		rand: rand.New(rand.NewPCG(uint64(time.Now().UnixNano()), 0)),
+	}
+}
+
+// GenerateProcessedModelFile creates a synthetic ProcessedModelFile
+func (g *SyntheticDataGenerator) GenerateProcessedModelFile() ProcessedModelFile {
+	fileTypes := []string{"model", "readme", "other"}
+	fileType := fileTypes[g.rand.IntN(len(fileTypes))]
+
+	var path string
+	var isReadme bool
+
+	switch fileType {
+	case "model":
+		path = fmt.Sprintf("model-%s.gguf", g.randomString(8))
+		isReadme = false
+	case "readme":
+		path = "README.md"
+		isReadme = true
+	default:
+		path = fmt.Sprintf("file-%s.txt", g.randomString(6))
+		isReadme = false
+	}
+
+	return ProcessedModelFile{
+		Path:     path,
+		Size:     int64(g.rand.IntN(1000000000) + 1000000), // 1MB to 1GB
+		SHA256:   g.randomSHA256(),
+		IsReadme: isReadme,
+		FileType: fileType,
+	}
+}
+
+// GenerateProcessedModel creates a synthetic ProcessedModel
+func (g *SyntheticDataGenerator) GenerateProcessedModel() ProcessedModel {
+	authors := []string{"microsoft", "meta", "google", "openai", "anthropic", "mistralai", "huggingface"}
+	modelNames := []string{"llama", "gpt", "claude", "mistral", "gemma", "phi", "qwen", "codellama"}
+
+	author := authors[g.rand.IntN(len(authors))]
+	modelName := modelNames[g.rand.IntN(len(modelNames))]
+	modelID := fmt.Sprintf("%s/%s-%s", author, modelName, g.randomString(6))
+
+	// Generate files
+	numFiles := g.rand.IntN(5) + 2 // 2-6 files
+	files := make([]ProcessedModelFile, numFiles)
+
+	// Ensure at least one model file and one readme
+	hasModelFile := false
+	hasReadme := false
+
+	for i := range numFiles {
+		files[i] = g.GenerateProcessedModelFile()
+		if files[i].FileType == "model" {
+			hasModelFile = true
+		}
+		if files[i].FileType == "readme" {
+			hasReadme = true
+		}
+	}
+
+	// Add required files if missing
+	if !hasModelFile {
+		modelFile := g.GenerateProcessedModelFile()
+		modelFile.FileType = "model"
+		modelFile.Path = fmt.Sprintf("%s-Q4_K_M.gguf", modelName)
+		files = append(files, modelFile)
+	}
+
+	if !hasReadme {
+		readmeFile := g.GenerateProcessedModelFile()
+		readmeFile.FileType = "readme"
+		readmeFile.Path = "README.md"
+		readmeFile.IsReadme = true
+		files = append(files, readmeFile)
+	}
+
+	// Find preferred model file
+	var preferredModelFile *ProcessedModelFile
+	for i := range files {
+		if files[i].FileType == "model" {
+			preferredModelFile = &files[i]
+			break
+		}
+	}
+
+	// Find readme file
+	var readmeFile *ProcessedModelFile
+	for i := range files {
+		if files[i].FileType == "readme" {
+			readmeFile = &files[i]
+			break
+		}
+	}
+
+	readmeContent := g.generateReadmeContent(modelName, author)
+
+	// Generate sample metadata
+	licenses := []string{"apache-2.0", "mit", "llama2", "gpl-3.0", "bsd", ""}
+	license := licenses[g.rand.IntN(len(licenses))]
+
+	sampleTags := []string{"llm", "gguf", "gpu", "cpu", "text-to-text", "chat", "instruction-tuned"}
+	numTags := g.rand.IntN(4) + 3 // 3-6 tags
+	tags := make([]string, numTags)
+	for i := range numTags {
+		tags[i] = sampleTags[g.rand.IntN(len(sampleTags))]
+	}
+	// Remove duplicates
+	tags = g.removeDuplicates(tags)
+
+	// Optionally include icon (50% chance)
+	icon := ""
+	if g.rand.IntN(2) == 0 {
+		icon = fmt.Sprintf("https://cdn-avatars.huggingface.co/v1/production/uploads/%s.png", g.randomString(24))
+	}
+
+	return ProcessedModel{
+		ModelID:                 modelID,
+		Author:                  author,
+		Downloads:               g.rand.IntN(1000000) + 1000,
+		LastModified:            g.randomDate(),
+		Files:                   files,
+		PreferredModelFile:      preferredModelFile,
+		ReadmeFile:              readmeFile,
+		ReadmeContent:           readmeContent,
+		ReadmeContentPreview:    truncateString(readmeContent, 200),
+		QuantizationPreferences: []string{"Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"},
+		ProcessingError:         "",
+		Tags:                    tags,
+		License:                 license,
+		Icon:                    icon,
+	}
+}
+
+// Helper methods for synthetic data generation
+func (g *SyntheticDataGenerator) randomString(length int) string {
+	const charset = "abcdefghijklmnopqrstuvwxyz0123456789"
+	b := make([]byte, length)
+	for i := range b {
+		b[i] = charset[g.rand.IntN(len(charset))]
+	}
+	return string(b)
+}
+
+func (g *SyntheticDataGenerator) randomSHA256() string {
+	const charset = "0123456789abcdef"
+	b := make([]byte, 64)
+	for i := range b {
+		b[i] = charset[g.rand.IntN(len(charset))]
+	}
+	return string(b)
+}
+
+func (g *SyntheticDataGenerator) randomDate() string {
+	now := time.Now()
+	daysAgo := g.rand.IntN(365) // Random date within last year
+	pastDate := now.AddDate(0, 0, -daysAgo)
+	return pastDate.Format("2006-01-02T15:04:05.000Z")
+}
+
+func (g *SyntheticDataGenerator) removeDuplicates(slice []string) []string {
+	keys := make(map[string]bool)
+	result := []string{}
+	for _, item := range slice {
+		if !keys[item] {
+			keys[item] = true
+			result = append(result, item)
+		}
+	}
+	return result
+}
+
+func (g *SyntheticDataGenerator) generateReadmeContent(modelName, author string) string {
+	templates := []string{
+		fmt.Sprintf("# %s Model\n\nThis is a %s model developed by %s. It's designed for various natural language processing tasks including text generation, question answering, and conversation.\n\n## Features\n\n- High-quality text generation\n- Efficient inference\n- Multiple quantization options\n- Easy to use with LocalAI\n\n## Usage\n\nUse this model with LocalAI for various AI tasks.", strings.Title(modelName), modelName, author),
+		fmt.Sprintf("# %s\n\nA powerful language model from %s. This model excels at understanding and generating human-like text across multiple domains.\n\n## Capabilities\n\n- Text completion\n- Code generation\n- Creative writing\n- Technical documentation\n\n## Model Details\n\n- Architecture: Transformer-based\n- Training: Large-scale supervised learning\n- Quantization: Available in multiple formats", strings.Title(modelName), author),
+		fmt.Sprintf("# %s Language Model\n\nDeveloped by %s, this model represents state-of-the-art performance in natural language understanding and generation.\n\n## Key Features\n\n- Multilingual support\n- Context-aware responses\n- Efficient memory usage\n- Fast inference speed\n\n## Applications\n\n- Chatbots and virtual assistants\n- Content generation\n- Code completion\n- Educational tools", strings.Title(modelName), author),
+	}
+
+	return templates[g.rand.IntN(len(templates))]
+}
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -1,4 +1,4 @@
-enhancements:
+enhancement:
 - head-branch: ['^feature', 'feature']

 dependencies:
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -0,0 +1,269 @@
+---
+name: 'build backend container images (reusable)'
+
+on:
+  workflow_call:
+    inputs:
+      base-image:
+        description: 'Base image'
+        required: true
+        type: string
+      build-type:
+        description: 'Build type'
+        default: ''
+        type: string
+      cuda-major-version:
+        description: 'CUDA major version'
+        default: "12"
+        type: string
+      cuda-minor-version:
+        description: 'CUDA minor version'
+        default: "1"
+        type: string
+      platforms:
+        description: 'Platforms'
+        default: ''
+        type: string
+      tag-latest:
+        description: 'Tag latest'
+        default: ''
+        type: string
+      tag-suffix:
+        description: 'Tag suffix'
+        default: ''
+        type: string
+      runs-on:
+        description: 'Runs on'
+        required: true
+        default: ''
+        type: string
+      backend:
+        description: 'Backend to build'
+        required: true
+        type: string
+      context:
+        description: 'Build context'
+        required: true
+        type: string
+      dockerfile:
+        description: 'Build Dockerfile'
+        required: true
+        type: string
+      skip-drivers:
+        description: 'Skip drivers'
+        default: 'false'
+        type: string
+      ubuntu-version:
+        description: 'Ubuntu version'
+        required: false
+        default: '2204'
+        type: string
+      amdgpu-targets:
+        description: 'AMD GPU targets for ROCm/HIP builds'
+        required: false
+        default: ''
+        type: string
+    secrets:
+      dockerUsername:
+        required: false
+      dockerPassword:
+        required: false
+      quayUsername:
+        required: true
+      quayPassword:
+        required: true
+
+jobs:
+  backend-build:
+    runs-on: ${{ inputs.runs-on }}
+    env:
+        quay_username: ${{ secrets.quayUsername }}
+    steps:
+
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+
+      - name: Configure apt mirror on runner
+        id: apt_mirror
+        uses: ./.github/actions/configure-apt-mirror
+
+      - name: Free Disk Space (Ubuntu)
+        if: inputs.runs-on == 'ubuntu-latest'
+        uses: jlumbroso/free-disk-space@main
+        with:
+          # this might remove tools that are actually needed,
+          # if set to "true" but frees about 6 GB
+          tool-cache: true
+          # all of these default to true, but feel free to set to
+          # "false" if necessary for your workflow
+          android: true
+          dotnet: true
+          haskell: true
+          large-packages: true
+          docker-images: true
+          swap-storage: true
+
+      - name: Release space from worker
+        if: inputs.runs-on == 'ubuntu-latest'
+        run: |
+          echo "Listing top largest packages"
+          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+          head -n 30 <<< "${pkgs}"
+          echo
+          df -h
+          echo
+          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
+          sudo apt-get remove --auto-remove android-sdk-platform-tools snapd || true
+          sudo apt-get purge --auto-remove android-sdk-platform-tools snapd || true
+          sudo rm -rf /usr/local/lib/android
+          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
+          sudo rm -rf /usr/share/dotnet
+          sudo apt-get remove -y '^mono-.*' || true
+          sudo apt-get remove -y '^ghc-.*' || true
+          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
+          sudo apt-get remove -y 'php.*' || true
+          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
+          sudo apt-get remove -y '^google-.*' || true
+          sudo apt-get remove -y azure-cli || true
+          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
+          sudo apt-get remove -y '^gfortran-.*' || true
+          sudo apt-get remove -y microsoft-edge-stable || true
+          sudo apt-get remove -y firefox || true
+          sudo apt-get remove -y powershell || true
+          sudo apt-get remove -y r-base-core || true
+          sudo apt-get autoremove -y
+          sudo apt-get clean
+          echo
+          echo "Listing top largest packages"
+          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+          head -n 30 <<< "${pkgs}"
+          echo
+          sudo rm -rfv build || true
+          sudo rm -rf /usr/share/dotnet || true
+          sudo rm -rf /opt/ghc || true
+          sudo rm -rf "/usr/local/share/boost" || true
+          sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
+          df -h
+
+      - name: Docker meta
+        id: meta
+        if: github.event_name != 'pull_request'
+        uses: docker/metadata-action@v6
+        with:
+          images: |
+            quay.io/go-skynet/local-ai-backends
+            localai/localai-backends
+          tags: |
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
+            type=sha
+          flavor: |
+            latest=${{ inputs.tag-latest }}
+            suffix=${{ inputs.tag-suffix }},onlatest=true
+
+      - name: Docker meta for PR
+        id: meta_pull_request
+        if: github.event_name == 'pull_request'
+        uses: docker/metadata-action@v6
+        with:
+          images: |
+            quay.io/go-skynet/ci-tests
+          tags: |
+            type=ref,event=branch,suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+            type=semver,pattern={{raw}},suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+            type=sha,suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+          flavor: |
+            latest=${{ inputs.tag-latest }}
+            suffix=${{ inputs.tag-suffix }},onlatest=true
+## End testing image
+      - name: Set up QEMU
+        uses: docker/setup-qemu-action@master
+        with:
+          platforms: all
+
+      - name: Set up Docker Buildx
+        id: buildx
+        uses: docker/setup-buildx-action@master
+
+      - name: Login to DockerHub
+        if: github.event_name != 'pull_request'
+        uses: docker/login-action@v4
+        with:
+          username: ${{ secrets.dockerUsername }}
+          password: ${{ secrets.dockerPassword }}
+
+      - name: Login to Quay.io
+        if: ${{ env.quay_username != '' }}
+        uses: docker/login-action@v4
+        with:
+          registry: quay.io
+          username: ${{ secrets.quayUsername }}
+          password: ${{ secrets.quayPassword }}
+
+      # Weekly cache-buster for the per-backend `make` step. Most Python
+      # backends list unpinned deps (torch, transformers, vllm, ...), so a
+      # warm cache freezes upstream versions indefinitely. Rolling this
+      # weekly forces a re-resolve of the install layer at most once per
+      # week, picking up newer wheels without a full cold rebuild.
+      - name: Compute deps refresh key
+        id: deps_refresh
+        run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
+
+      - name: Build and push
+        uses: docker/build-push-action@v7
+        if: github.event_name != 'pull_request'
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          build-args: |
+            BUILD_TYPE=${{ inputs.build-type }}
+            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
+            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
+            BASE_IMAGE=${{ inputs.base-image }}
+            BACKEND=${{ inputs.backend }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
+            DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+          context: ${{ inputs.context }}
+          file: ${{ inputs.dockerfile }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
+          platforms: ${{ inputs.platforms }}
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+
+      - name: Build and push (PR)
+        uses: docker/build-push-action@v7
+        if: github.event_name == 'pull_request'
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          build-args: |
+            BUILD_TYPE=${{ inputs.build-type }}
+            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
+            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
+            BASE_IMAGE=${{ inputs.base-image }}
+            BACKEND=${{ inputs.backend }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
+            DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
+          context: ${{ inputs.context }}
+          file: ${{ inputs.dockerfile }}
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
+          platforms: ${{ inputs.platforms }}
+          push: ${{ env.quay_username != '' }}
+          tags: ${{ steps.meta_pull_request.outputs.tags }}
+          labels: ${{ steps.meta_pull_request.outputs.labels }}
+
+
+
+      - name: job summary
+        run: |
+          echo "Built image: ${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/backend_build_darwin.yml
+++ b/.github/workflows/backend_build_darwin.yml
@@ -0,0 +1,271 @@
+---
+name: 'build darwin python backend container images (reusable)'
+
+on:
+  workflow_call:
+    inputs:
+      backend:
+        description: 'Backend to build'
+        required: true
+        type: string
+      build-type:
+        description: 'Build type (e.g., mps)'
+        default: ''
+        type: string
+      use-pip:
+        description: 'Use pip to install dependencies'
+        default: false
+        type: boolean
+      lang:
+        description: 'Programming language (e.g. go)'
+        default: 'python'
+        type: string
+      go-version:
+        description: 'Go version to use'
+        default: '1.24.x'
+        type: string
+      tag-suffix:
+        description: 'Tag suffix for the built image'
+        required: true
+        type: string
+      runs-on:
+        description: 'Runner to use'
+        default: 'macOS-14'
+        type: string
+    secrets:
+      dockerUsername:
+        required: false
+      dockerPassword:
+        required: false
+      quayUsername:
+        required: true
+      quayPassword:
+        required: true
+
+jobs:
+  darwin-backend-build:
+    runs-on: ${{ inputs.runs-on }}
+    strategy:
+      matrix:
+        go-version: ['${{ inputs.go-version }}']
+    env:
+      # Keep the brew Cellar stable across cache restores. Without these,
+      # `brew install` would auto-update brew itself and re-link formulas,
+      # mutating the very paths the cache just restored.
+      HOMEBREW_NO_AUTO_UPDATE: '1'
+      HOMEBREW_NO_INSTALL_CLEANUP: '1'
+      HOMEBREW_NO_ANALYTICS: '1'
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+
+      - name: Setup Go ${{ matrix.go-version }}
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ matrix.go-version }}
+          # Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
+          # Shared across every darwin matrix entry — first job in a run warms
+          # it, the rest hit warm.
+          cache: true
+
+      # You can test your matrix by printing the current Go version
+      - name: Display Go version
+        run: go version
+
+      # ---- Homebrew cache ----
+      # macOS runners have no Docker daemon, so the BuildKit registry cache used
+      # for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
+      # We cache the brew downloads + Cellar entries for the formulas we install
+      # below. Read on every run, write only on master/tag pushes — same policy
+      # as the Linux registry cache.
+      - name: Restore Homebrew cache
+        id: brew-cache
+        uses: actions/cache/restore@v4
+        with:
+          path: |
+            ~/Library/Caches/Homebrew/downloads
+            /opt/homebrew/Cellar/protobuf
+            /opt/homebrew/Cellar/grpc
+            /opt/homebrew/Cellar/protoc-gen-go
+            /opt/homebrew/Cellar/protoc-gen-go-grpc
+            /opt/homebrew/Cellar/libomp
+            /opt/homebrew/Cellar/llvm
+            /opt/homebrew/Cellar/ccache
+          key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
+
+      - name: Dependencies
+        run: |
+          # ccache is always installed (used by the llama-cpp variant build) so
+          # the brew cache content stays stable across every backend in the
+          # matrix — they all share one cache key.
+          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
+
+      - name: Save Homebrew cache
+        if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
+        uses: actions/cache/save@v4
+        with:
+          path: |
+            ~/Library/Caches/Homebrew/downloads
+            /opt/homebrew/Cellar/protobuf
+            /opt/homebrew/Cellar/grpc
+            /opt/homebrew/Cellar/protoc-gen-go
+            /opt/homebrew/Cellar/protoc-gen-go-grpc
+            /opt/homebrew/Cellar/libomp
+            /opt/homebrew/Cellar/llvm
+            /opt/homebrew/Cellar/ccache
+          key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
+
+      # ---- ccache for llama.cpp CMake builds ----
+      # Three CMake variants (fallback, grpc, rpc-server) compile the same
+      # llama.cpp source tree with overlapping flags — ccache dedupes object
+      # files across them. Key on the pinned LLAMA_VERSION so a pin bump
+      # invalidates cleanly; restore-keys fall back to the latest entry for the
+      # same pin so unchanged TUs stay warm even when the cache is fresh.
+      - name: Compute llama.cpp version
+        if: inputs.backend == 'llama-cpp'
+        id: llama-version
+        run: |
+          version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
+          echo "version=${version}" >> "$GITHUB_OUTPUT"
+
+      - name: Restore ccache
+        if: inputs.backend == 'llama-cpp'
+        id: ccache-cache
+        uses: actions/cache/restore@v4
+        with:
+          path: ~/Library/Caches/ccache
+          key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
+          restore-keys: |
+            ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
+
+      - name: Configure ccache
+        if: inputs.backend == 'llama-cpp'
+        run: |
+          mkdir -p "$HOME/Library/Caches/ccache"
+          ccache -M 2G
+          ccache -z
+          # llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
+          {
+            echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
+            echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
+          } >> "$GITHUB_ENV"
+
+      # ---- Python wheel cache (uv + pip) ----
+      # Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
+      # ISO-week segment of the cache key forces at most one cold rebuild per
+      # backend per week, automatically picking up newer wheels for unpinned
+      # deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
+      # recent build of the same backend so off-week PRs still hit warm.
+      - name: Compute weekly cache bucket
+        if: inputs.lang == 'python'
+        id: weekly
+        run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
+
+      - name: Restore Python wheel cache
+        if: inputs.lang == 'python'
+        id: pyenv-cache
+        uses: actions/cache/restore@v4
+        with:
+          path: |
+            ~/Library/Caches/pip
+            ~/Library/Caches/uv
+          key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
+          restore-keys: |
+            pyenv-darwin-${{ inputs.backend }}-
+
+      - name: Build ${{ inputs.backend }}-darwin
+        run: |
+          make protogen-go
+          BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
+
+      - name: ccache stats
+        if: inputs.backend == 'llama-cpp'
+        run: ccache -s
+
+      - name: Save ccache
+        if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
+        uses: actions/cache/save@v4
+        with:
+          path: ~/Library/Caches/ccache
+          key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
+
+      - name: Save Python wheel cache
+        if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
+        uses: actions/cache/save@v4
+        with:
+          path: |
+            ~/Library/Caches/pip
+            ~/Library/Caches/uv
+          key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
+
+      - name: Upload ${{ inputs.backend }}.tar
+        uses: actions/upload-artifact@v7
+        with:
+          name: ${{ inputs.backend }}-tar
+          path: backend-images/${{ inputs.backend }}.tar
+
+  darwin-backend-publish:
+    needs: darwin-backend-build
+    if: github.event_name != 'pull_request'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Download ${{ inputs.backend }}.tar
+        uses: actions/download-artifact@v8
+        with:
+          name: ${{ inputs.backend }}-tar
+          path: .
+
+      - name: Install crane
+        run: |
+          curl -L https://github.com/google/go-containerregistry/releases/latest/download/go-containerregistry_Linux_x86_64.tar.gz | tar -xz
+          sudo mv crane /usr/local/bin/
+
+      - name: Log in to DockerHub
+        run: |
+          echo "${{ secrets.dockerPassword }}" | crane auth login docker.io -u "${{ secrets.dockerUsername }}" --password-stdin
+
+      - name: Log in to quay.io
+        run: |
+          echo "${{ secrets.quayPassword }}" | crane auth login quay.io -u "${{ secrets.quayUsername }}" --password-stdin
+
+      - name: Docker meta
+        id: meta
+        uses: docker/metadata-action@v6
+        with:
+          images: |
+            localai/localai-backends
+          tags: |
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
+            type=sha
+          flavor: |
+            latest=auto
+            suffix=${{ inputs.tag-suffix }},onlatest=true
+
+      - name: Docker meta
+        id: quaymeta
+        uses: docker/metadata-action@v6
+        with:
+          images: |
+            quay.io/go-skynet/local-ai-backends
+          tags: |
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
+            type=sha
+          flavor: |
+            latest=auto
+            suffix=${{ inputs.tag-suffix }},onlatest=true
+
+      - name: Push Docker image (DockerHub)
+        run: |
+          for tag in $(echo "${{ steps.meta.outputs.tags }}" | tr ',' '\n'); do
+            crane push ${{ inputs.backend }}.tar $tag
+          done
+
+      - name: Push Docker image (Quay)
+        run: |
+          for tag in $(echo "${{ steps.quaymeta.outputs.tags }}" | tr ',' '\n'); do
+            crane push ${{ inputs.backend }}.tar $tag
+          done
--- a/.github/workflows/backend_pr.yml
+++ b/.github/workflows/backend_pr.yml
@@ -0,0 +1,80 @@
+name: 'build backend container images (PR-filtered)'
+
+on:
+  pull_request:
+
+concurrency:
+  group: ci-backends-pr-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  generate-matrix:
+    runs-on: ubuntu-latest
+    outputs:
+      matrix: ${{ steps.set-matrix.outputs.matrix }}
+      matrix-darwin: ${{ steps.set-matrix.outputs.matrix-darwin }}
+      has-backends: ${{ steps.set-matrix.outputs.has-backends }}
+      has-backends-darwin: ${{ steps.set-matrix.outputs.has-backends-darwin }}
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Setup Bun
+        uses: oven-sh/setup-bun@v2
+
+      - name: Install dependencies
+        run: |
+          bun add js-yaml
+          bun add @octokit/core
+
+      # filters the matrix in backend.yml
+      - name: Filter matrix for changed backends
+        id: set-matrix
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_EVENT_PATH: ${{ github.event_path }}
+        run: bun run scripts/changed-backends.js
+
+  backend-jobs:
+    needs: generate-matrix
+    uses: ./.github/workflows/backend_build.yml
+    if: needs.generate-matrix.outputs.has-backends == 'true'
+    with:
+      tag-latest: ${{ matrix.tag-latest }}
+      tag-suffix: ${{ matrix.tag-suffix }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      base-image: ${{ matrix.base-image }}
+      backend: ${{ matrix.backend }}
+      dockerfile: ${{ matrix.dockerfile }}
+      skip-drivers: ${{ matrix.skip-drivers }}
+      context: ${{ matrix.context }}
+      ubuntu-version: ${{ matrix.ubuntu-version }}
+      amdgpu-targets: ${{ matrix.amdgpu-targets || 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201' }}
+    secrets:
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      fail-fast: true
+      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
+  backend-jobs-darwin:
+    needs: generate-matrix
+    uses: ./.github/workflows/backend_build_darwin.yml
+    if: needs.generate-matrix.outputs.has-backends-darwin == 'true'
+    with:
+      backend: ${{ matrix.backend }}
+      build-type: ${{ matrix.build-type }}
+      go-version: "1.24.x"
+      tag-suffix: ${{ matrix.tag-suffix }}
+      lang: ${{ matrix.lang || 'python' }}
+      use-pip: ${{ matrix.backend == 'diffusers' }}
+      runs-on: "macos-latest"
+    secrets:
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      fail-fast: true
+      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix-darwin) }}
--- a/.github/workflows/build-test.yaml
+++ b/.github/workflows/build-test.yaml
@@ -0,0 +1,69 @@
+name: Build test
+
+on:
+  push:
+    branches:
+      - master
+  pull_request:
+
+jobs:
+  build-test:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 1.25
+      - name: Run GoReleaser
+        run: |
+          make dev-dist
+  launcher-build-darwin:
+    runs-on: macos-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 1.25
+      - name: Build launcher for macOS ARM64
+        run: |
+          make build-launcher-darwin
+          ls -liah dist
+      - name: Upload macOS launcher artifacts
+        uses: actions/upload-artifact@v7
+        with:
+          name: launcher-macos
+          path: dist/
+          retention-days: 30
+      
+  launcher-build-linux:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 1.25
+      - name: Build launcher for Linux
+        run: |
+          sudo apt-get update
+          sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
+          make build-launcher-linux
+      - name: Upload Linux launcher artifacts
+        uses: actions/upload-artifact@v7
+        with:
+          name: launcher-linux
+          path: local-ai-launcher-linux.tar.xz
+          retention-days: 30
--- a/.github/workflows/bump-inference-defaults.yml
+++ b/.github/workflows/bump-inference-defaults.yml
@@ -0,0 +1,48 @@
+name: Bump inference defaults
+
+on:
+  schedule:
+    # Run daily at 06:00 UTC
+    - cron: '0 6 * * *'
+  workflow_dispatch: # Allow manual trigger
+
+permissions:
+  contents: write
+  pull-requests: write
+
+jobs:
+  bump:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+
+      - uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+
+      - name: Re-fetch inference defaults
+        run: make generate-force
+
+      - name: Check for changes
+        id: diff
+        run: |
+          if git diff --quiet core/config/inference_defaults.json; then
+            echo "changed=false" >> "$GITHUB_OUTPUT"
+          else
+            echo "changed=true" >> "$GITHUB_OUTPUT"
+          fi
+
+      - name: Create Pull Request
+        if: steps.diff.outputs.changed == 'true'
+        uses: peter-evans/create-pull-request@v8
+        with:
+          commit-message: "chore: bump inference defaults from unsloth"
+          title: "chore: bump inference defaults from unsloth"
+          body: |
+            Auto-generated update of `core/config/inference_defaults.json` from
+            [unsloth's inference_defaults.json](https://github.com/unslothai/unsloth/blob/main/studio/backend/assets/configs/inference_defaults.json).
+
+            This PR was created automatically by the `bump-inference-defaults` workflow.
+          branch: chore/bump-inference-defaults
+          delete-branch: true
+          labels: automated
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -1,39 +1,62 @@
-name: Bump dependencies
+name: Bump Backend dependencies
 on:
  schedule:
    - cron: 0 20 * * *
  workflow_dispatch:
 jobs:
-  bump:
+  bump-backends:
+    if: github.repository == 'mudler/LocalAI'
    strategy:
      fail-fast: false
      matrix:
        include:
-          - repository: "ggerganov/llama.cpp"
-            variable: "CPPLLAMA_VERSION"
+          - repository: "ggml-org/llama.cpp"
+            variable: "LLAMA_VERSION"
            branch: "master"
-          - repository: "ggerganov/whisper.cpp"
+            file: "backend/cpp/llama-cpp/Makefile"
+          - repository: "ikawrakow/ik_llama.cpp"
+            variable: "IK_LLAMA_VERSION"
+            branch: "main"
+            file: "backend/cpp/ik-llama-cpp/Makefile"
+          - repository: "TheTom/llama-cpp-turboquant"
+            variable: "TURBOQUANT_VERSION"
+            branch: "feature/turboquant-kv-cache"
+            file: "backend/cpp/turboquant/Makefile"
+          - repository: "ggml-org/whisper.cpp"
            variable: "WHISPER_CPP_VERSION"
            branch: "master"
-          - repository: "PABannier/bark.cpp"
-            variable: "BARKCPP_VERSION"
-            branch: "main"
+            file: "backend/go/whisper/Makefile"
          - repository: "leejet/stable-diffusion.cpp"
            variable: "STABLEDIFFUSION_GGML_VERSION"
            branch: "master"
-          - repository: "mudler/go-stable-diffusion"
-            variable: "STABLEDIFFUSION_VERSION"
-            branch: "master"
+            file: "backend/go/stablediffusion-ggml/Makefile"
          - repository: "mudler/go-piper"
            variable: "PIPER_VERSION"
            branch: "master"
+            file: "backend/go/piper/Makefile"
+          - repository: "antirez/voxtral.c"
+            variable: "VOXTRAL_VERSION"
+            branch: "main"
+            file: "backend/go/voxtral/Makefile"
+          - repository: "ace-step/acestep.cpp"
+            variable: "ACESTEP_CPP_VERSION"
+            branch: "master"
+            file: "backend/go/acestep-cpp/Makefile"
+          - repository: "PABannier/sam3.cpp"
+            variable: "SAM3_VERSION"
+            branch: "main"
+            file: "backend/go/sam3-cpp/Makefile"
+          - repository: "predict-woo/qwen3-tts.cpp"
+            variable: "QWEN3TTS_CPP_VERSION"
+            branch: "main"
+            file: "backend/go/qwen3-tts-cpp/Makefile"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
      - name: Bump dependencies 🔧
        id: bump
        run: |
-          bash .github/bump_deps.sh ${{ matrix.repository }} ${{ matrix.branch }} ${{ matrix.variable }}
+          bash .github/bump_deps.sh ${{ matrix.repository }} ${{ matrix.branch }} ${{ matrix.variable }} ${{ matrix.file }}
          {
            echo 'message<<EOF'
            cat "${{ matrix.variable }}_message.txt"
@@ -47,7 +70,7 @@ jobs:
          rm -rfv ${{ matrix.variable }}_message.txt
          rm -rfv ${{ matrix.variable }}_commit.txt
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v7
+        uses: peter-evans/create-pull-request@v8
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
@@ -57,5 +80,37 @@ jobs:
          body: ${{ steps.bump.outputs.message }}
          signoff: true

-
-
+  bump-vllm-wheel:
+    # vLLM's cu130 wheel comes from a per-tag index URL (no /latest/ alias),
+    # so the cublas13 requirements file pins both a URL segment and a version
+    # constraint. bump_deps.sh handles git-sha-in-Makefile only — this job
+    # rewrites both values atomically when a new vLLM stable tag ships.
+    if: github.repository == 'mudler/LocalAI'
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - name: Bump vLLM cu130 wheel pin 🔧
+        id: bump
+        run: |
+          bash .github/bump_vllm_wheel.sh vllm-project/vllm backend/python/vllm/requirements-cublas13-after.txt VLLM_VERSION
+          {
+            echo 'message<<EOF'
+            cat "VLLM_VERSION_message.txt"
+            echo EOF
+          } >> "$GITHUB_OUTPUT"
+          {
+            echo 'commit<<EOF'
+            cat "VLLM_VERSION_commit.txt"
+            echo EOF
+          } >> "$GITHUB_OUTPUT"
+          rm -rfv VLLM_VERSION_message.txt VLLM_VERSION_commit.txt
+      - name: Create Pull Request
+        uses: peter-evans/create-pull-request@v8
+        with:
+          token: ${{ secrets.UPDATE_BOT_TOKEN }}
+          push-to-fork: ci-forks/LocalAI
+          commit-message: ':arrow_up: Update vllm-project/vllm cu130 wheel'
+          title: 'chore: :arrow_up: Update vllm-project/vllm cu130 wheel to `${{ steps.bump.outputs.commit }}`'
+          branch: "update/VLLM_VERSION"
+          body: ${{ steps.bump.outputs.message }}
+          signoff: true
--- a/.github/workflows/bump_docs.yaml
+++ b/.github/workflows/bump_docs.yaml
@@ -1,10 +1,11 @@
-name: Bump dependencies
+name: Bump Documentation
 on:
  schedule:
    - cron: 0 20 * * *
  workflow_dispatch:
 jobs:
-  bump:
+  bump-docs:
+    if: github.repository == 'mudler/LocalAI'
    strategy:
      fail-fast: false
      matrix:
@@ -12,12 +13,12 @@ jobs:
          - repository: "mudler/LocalAI"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
      - name: Bump dependencies 🔧
        run: |
          bash .github/bump_docs.sh ${{ matrix.repository }}
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v7
+        uses: peter-evans/create-pull-request@v8
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
--- a/.github/workflows/checksum_checker.yaml
+++ b/.github/workflows/checksum_checker.yaml
@@ -5,22 +5,16 @@ on:
  workflow_dispatch:
 jobs:
  checksum_check:
-    runs-on: arc-runner-set
+    if: github.repository == 'mudler/LocalAI'
+    runs-on: ubuntu-latest
    steps:
-      - name: Force Install GIT latest
-        run: |
-          sudo apt-get update \
-          && sudo apt-get install -y software-properties-common \
-          && sudo apt-get update \
-          && sudo add-apt-repository -y ppa:git-core/ppa \
-          && sudo apt-get update \
-          && sudo apt-get install -y git
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
      - name: Install dependencies
        run: |
          sudo apt-get update
          sudo apt-get install -y pip wget
-          sudo pip install --upgrade pip
          pip install huggingface_hub
      - name: 'Setup yq'
        uses: dcarbone/install-yq-action@v1.3.1
@@ -36,7 +30,7 @@ jobs:
          sudo chmod 777 /hf_cache
          bash .github/checksum_checker.sh gallery/index.yaml
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v7
+        uses: peter-evans/create-pull-request@v8
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
--- a/.github/workflows/deploy-explorer.yaml
+++ b/.github/workflows/deploy-explorer.yaml
@@ -12,10 +12,11 @@ concurrency:

 jobs:
  build-linux:
+    if: github.repository == 'mudler/LocalAI'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - uses: actions/setup-go@v5
@@ -31,9 +32,9 @@ jobs:
          make protogen-go
      - name: Build api
        run: |
-          CGO_ENABLED=0 make build-api
+          CGO_ENABLED=0 make build
      - name: rm
-        uses: appleboy/ssh-action@v1.2.0
+        uses: appleboy/ssh-action@v1.2.5
        with:
            host: ${{ secrets.EXPLORER_SSH_HOST }}
            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
@@ -42,7 +43,7 @@ jobs:
            script: |
                sudo rm -rf local-ai/ || true
      - name: copy file via ssh
-        uses: appleboy/scp-action@v0.1.7
+        uses: appleboy/scp-action@v1.0.0
        with:
            host: ${{ secrets.EXPLORER_SSH_HOST }}
            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
@@ -53,7 +54,7 @@ jobs:
            rm: true
            target: ./local-ai
      - name: restarting
-        uses: appleboy/ssh-action@v1.2.0
+        uses: appleboy/ssh-action@v1.2.5
        with:
            host: ${{ secrets.EXPLORER_SSH_HOST }}
            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
--- a/.github/workflows/disabled/dependabot_auto.yml
+++ b/.github/workflows/disabled/dependabot_auto.yml
@@ -9,18 +9,18 @@ permissions:

 jobs:
  dependabot:
+    if: github.repository == 'mudler/LocalAI' && github.actor == 'dependabot[bot]'
    runs-on: ubuntu-latest
-    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      - name: Dependabot metadata
        id: metadata
-        uses: dependabot/fetch-metadata@v2.3.0
+        uses: dependabot/fetch-metadata@v2.5.0
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
          skip-commit-verification: true

      - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6

      - name: Approve a PR if not already approved
        run: |
--- a/.github/workflows/disabled/labeler.yml
+++ b/.github/workflows/disabled/labeler.yml
@@ -9,4 +9,4 @@ jobs:
      pull-requests: write
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/labeler@v5
+    - uses: actions/labeler@v6
--- a/.github/workflows/disabled/localaibot_automerge.yml
+++ b/.github/workflows/disabled/localaibot_automerge.yml
@@ -6,14 +6,15 @@ permissions:
  contents: write
  pull-requests: write
  packages: read
-
+  issues: write # for Homebrew/actions/post-comment
+  actions: write # to dispatch publish workflow
 jobs:
  dependabot:
+    if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && contains(github.event.pull_request.title, 'chore:')
    runs-on: ubuntu-latest
-    if: ${{ github.actor == 'localai-bot' }}
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6

      - name: Approve a PR if not already approved
        run: |
--- a/.github/workflows/disabled/notify-models.yaml
+++ b/.github/workflows/disabled/notify-models.yaml
@@ -1,24 +1,29 @@
 name: Notifications for new models
 on:
-  pull_request:
+  pull_request_target:
     types:
       - closed

+permissions:
+  contents: read
+  pull-requests: read
+
 jobs:
  notify-discord:
-    if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
+    if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
    env:
-        MODEL_NAME: hermes-2-theta-llama-3-8b
+        MODEL_NAME: gemma-3-12b-it-qat
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v6
      with:
        fetch-depth: 0 # needed to checkout all branches for this Action to work
+        ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes
    - uses: mudler/localai-github-action@v1
      with:
-        model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
+        model: 'gemma-3-12b-it-qat' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
        # Check the PR diff using the current branch and the base branch of the PR
-    - uses: GrantBirki/git-diff-action@v2.8.0
+    - uses: GrantBirki/git-diff-action@v2.8.1
      id: git-diff-action
      with:
            json_diff_file_output: diff.json
@@ -79,27 +84,28 @@ jobs:
        args: ${{ steps.summarize.outputs.message }}
    - name: Setup tmate session if fails
      if: ${{ failure() }}
-      uses: mxschmitt/action-tmate@v3.19
+      uses: mxschmitt/action-tmate@v3.23
      with:
        detached: true
        connect-timeout-seconds: 180
        limit-access-to-actor: true
  notify-twitter:
-    if: ${{ (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model')) }}
+    if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
    env:
-        MODEL_NAME: hermes-2-theta-llama-3-8b
+        MODEL_NAME: gemma-3-12b-it-qat
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@v6
      with:
        fetch-depth: 0 # needed to checkout all branches for this Action to work
+        ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes
    - name: Start LocalAI
      run: |
        echo "Starting LocalAI..."
-        docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master-ffmpeg-core run --debug $MODEL_NAME
+        docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master run --debug $MODEL_NAME
        until [ "`docker inspect -f {{.State.Health.Status}} local-ai`" == "healthy" ]; do echo "Waiting for container to be ready";  docker logs --tail 10 local-ai; sleep 2; done
      # Check the PR diff using the current branch and the base branch of the PR
-    - uses: GrantBirki/git-diff-action@v2.8.0
+    - uses: GrantBirki/git-diff-action@v2.8.1
      id: git-diff-action
      with:
            json_diff_file_output: diff.json
@@ -161,7 +167,7 @@ jobs:
        TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }}
    - name: Setup tmate session if fails
      if: ${{ failure() }}
-      uses: mxschmitt/action-tmate@v3.19
+      uses: mxschmitt/action-tmate@v3.23
      with:
        detached: true
        connect-timeout-seconds: 180
--- a/.github/workflows/disabled/prlint.yaml
+++ b/.github/workflows/disabled/prlint.yaml
--- a/.github/workflows/gallery-agent.yaml
+++ b/.github/workflows/gallery-agent.yaml
@@ -0,0 +1,214 @@
+name: Gallery Agent
+on:
+
+  schedule:
+    - cron: '0 */12 * * *'  # Run every 4 hours
+  workflow_dispatch:
+    inputs:
+      search_term:
+        description: 'Search term for models'
+        required: false
+        default: 'GGUF'
+        type: string
+      limit:
+        description: 'Maximum number of models to process'
+        required: false
+        default: '15'
+        type: string
+      quantization:
+        description: 'Preferred quantization format'
+        required: false
+        default: 'Q4_K_M'
+        type: string
+      max_models:
+        description: 'Maximum number of models to add to the gallery'
+        required: false
+        default: '1'
+        type: string
+jobs:
+  gallery-agent:
+    if: github.repository == 'mudler/LocalAI'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          token: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.21'
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Process gallery-agent PR commands
+        env:
+          GH_TOKEN: ${{ secrets.UPDATE_BOT_TOKEN }}
+          REPO: ${{ github.repository }}
+          SEARCH: 'gallery agent in:title'
+        run: |
+          # Walk gallery-agent PRs and act on maintainer comments:
+          #   /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
+          #   /gallery-agent recreate  → close without label (next run may repropose)
+          # Only comments from OWNER / MEMBER / COLLABORATOR are honored so
+          # random users can't drive the bot.
+          #
+          # We scan both open PRs AND recently-closed PRs that don't already
+          # carry the blacklist label. This covers the common flow where a
+          # maintainer writes /gallery-agent blacklist and immediately clicks
+          # Close — without this, the next scheduled run wouldn't see the
+          # command (PR is already closed) and would repropose the model.
+          gh label create gallery-agent/blacklisted \
+            --repo "$REPO" --color ededed \
+            --description "gallery-agent must not repropose this model" 2>/dev/null || true
+
+          prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
+            --json number --jq '.[].number')
+          # Closed PRs from the last 14 days that don't yet have the blacklist label.
+          # Bounded window keeps the scan cheap while covering late-applied commands.
+          since=$(date -u -d '14 days ago' +%Y-%m-%d)
+          prs_closed=$(gh pr list --repo "$REPO" --state closed \
+            --search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
+            --json number --jq '.[].number')
+          prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
+          for pr in $prs; do
+            state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
+            cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
+              --jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
+            if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
+              echo "PR #$pr: blacklist command found (state=$state)"
+              gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
+              if [ "$state" = "OPEN" ]; then
+                gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
+              fi
+            elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
+              echo "PR #$pr: recreate command found"
+              gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
+            fi
+          done
+
+      - name: Collect skip URLs for the gallery agent
+        id: open_prs
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          REPO: ${{ github.repository }}
+          SEARCH: 'gallery agent in:title'
+        run: |
+          # Skip set =
+          #   URLs from any open gallery-agent PR (avoid duplicate PRs for the same model while one is pending)
+          # + URLs from closed PRs carrying the `gallery-agent/blacklisted` label (hard blacklist)
+          # Plain-closed PRs without the label are ignored — closing a PR is
+          # not by itself a "never propose again" signal; maintainers must
+          # opt in via the /gallery-agent blacklist comment command.
+          urls_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
+            --json body --jq '[.[].body] | join("\n")' \
+            | grep -oE 'https://huggingface\.co/[^ )]+' || true)
+          urls_blacklist=$(gh pr list --repo "$REPO" --state closed --search "$SEARCH" \
+            --label gallery-agent/blacklisted \
+            --json body --jq '[.[].body] | join("\n")' \
+            | grep -oE 'https://huggingface\.co/[^ )]+' || true)
+          urls=$(printf '%s\n%s\n' "$urls_open" "$urls_blacklist" | sort -u | sed '/^$/d')
+          echo "Skip URLs:"
+          echo "$urls"
+          {
+            echo "urls<<EOF"
+            echo "$urls"
+            echo "EOF"
+          } >> "$GITHUB_OUTPUT"
+
+      - name: Run gallery agent
+        env:
+          SEARCH_TERM: ${{ github.event.inputs.search_term || 'GGUF' }}
+          LIMIT: ${{ github.event.inputs.limit || '15' }}
+          QUANTIZATION: ${{ github.event.inputs.quantization || 'Q4_K_M' }}
+          MAX_MODELS: ${{ github.event.inputs.max_models || '1' }}
+          EXTRA_SKIP_URLS: ${{ steps.open_prs.outputs.urls }}
+        run: |
+          export GALLERY_INDEX_PATH=$PWD/gallery/index.yaml
+          go run ./.github/gallery-agent
+
+      - name: Check for changes
+        id: check_changes
+        run: |
+          if git diff --quiet gallery/index.yaml; then
+            echo "changes=false" >> $GITHUB_OUTPUT
+            echo "No changes detected in gallery/index.yaml"
+          else
+            echo "changes=true" >> $GITHUB_OUTPUT
+            echo "Changes detected in gallery/index.yaml"
+            git diff gallery/index.yaml
+          fi
+
+      - name: Read gallery agent summary
+        id: read_summary
+        if: steps.check_changes.outputs.changes == 'true'
+        run: |
+          if [ -f "./gallery-agent-summary.json" ]; then
+            echo "summary_exists=true" >> $GITHUB_OUTPUT
+            # Extract summary data using jq
+            echo "search_term=$(jq -r '.search_term' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
+            echo "total_found=$(jq -r '.total_found' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
+            echo "models_added=$(jq -r '.models_added' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
+            echo "quantization=$(jq -r '.quantization' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
+            echo "processing_time=$(jq -r '.processing_time' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
+            
+            # Create a formatted list of added models with URLs
+            added_models=$(jq -r 'range(0; .added_model_ids | length) as $i | "- [\(.added_model_ids[$i])](\(.added_model_urls[$i]))"' ./gallery-agent-summary.json | tr '\n' '\n')
+            echo "added_models<<EOF" >> $GITHUB_OUTPUT
+            echo "$added_models" >> $GITHUB_OUTPUT
+            echo "EOF" >> $GITHUB_OUTPUT
+            rm -f ./gallery-agent-summary.json
+          else
+            echo "summary_exists=false" >> $GITHUB_OUTPUT
+          fi
+
+      - name: Create Pull Request
+        if: steps.check_changes.outputs.changes == 'true'
+        uses: peter-evans/create-pull-request@v8
+        with:
+          token: ${{ secrets.UPDATE_BOT_TOKEN }}
+          push-to-fork: ci-forks/LocalAI
+          commit-message: 'chore(model gallery): :robot: add new models via gallery agent'
+          title: 'chore(model gallery): :robot: add ${{ steps.read_summary.outputs.models_added || 0 }} new models via gallery agent'
+          # Branch has to be unique so PRs are not overriding each other
+          branch-suffix: timestamp
+          body: |
+            This PR was automatically created by the gallery agent workflow.
+            
+            **Summary:**
+            - **Search Term:** ${{ steps.read_summary.outputs.search_term || github.event.inputs.search_term || 'GGUF' }}
+            - **Models Found:** ${{ steps.read_summary.outputs.total_found || 'N/A' }}
+            - **Models Added:** ${{ steps.read_summary.outputs.models_added || '0' }}
+            - **Quantization:** ${{ steps.read_summary.outputs.quantization || github.event.inputs.quantization || 'Q4_K_M' }}
+            - **Processing Time:** ${{ steps.read_summary.outputs.processing_time || 'N/A' }}
+            
+            **Added Models:**
+            ${{ steps.read_summary.outputs.added_models || '- No models added' }}
+
+            ### Bot commands
+
+            Maintainers (owner / member / collaborator) can control this PR
+            by leaving a comment with one of:
+
+            - `/gallery-agent recreate` — close this PR; the next scheduled
+              run will propose this model again (useful if the entry needs
+              to be regenerated with fresh metadata).
+            - `/gallery-agent blacklist` — close this PR and permanently
+              prevent the gallery agent from ever reproposing this model.
+
+            Plain "Close" (without a command) is treated as a no-op: the
+            model may be reproposed by a future run.
+
+            **Workflow Details:**
+            - Triggered by: `${{ github.event_name }}`
+            - Run ID: `${{ github.run_id }}`
+            - Commit: `${{ github.sha }}`
+          signoff: true
+          delete-branch: true
--- a/.github/workflows/generate_grpc_cache.yaml
+++ b/.github/workflows/generate_grpc_cache.yaml
@@ -1,94 +0,0 @@
-name: 'generate and publish GRPC docker caches'
-
-on:
-  workflow_dispatch:
-  push:
-    branches:
-      - master
-
-concurrency:
-  group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  generate_caches:
-    strategy:
-      matrix:
-        include:
-          - grpc-base-image: ubuntu:22.04
-            runs-on: 'ubuntu-latest'
-            platforms: 'linux/amd64,linux/arm64'
-    runs-on: ${{matrix.runs-on}}
-    steps:
-      - name: Release space from worker
-        if: matrix.runs-on == 'ubuntu-latest'
-        run: |
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          df -h
-          echo
-          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
-          sudo rm -rf /usr/local/lib/android
-          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
-          sudo rm -rf /usr/share/dotnet
-          sudo apt-get remove -y '^mono-.*' || true
-          sudo apt-get remove -y '^ghc-.*' || true
-          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
-          sudo apt-get remove -y 'php.*' || true
-          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
-          sudo apt-get remove -y '^google-.*' || true
-          sudo apt-get remove -y azure-cli || true
-          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
-          sudo apt-get remove -y '^gfortran-.*' || true
-          sudo apt-get remove -y microsoft-edge-stable || true
-          sudo apt-get remove -y firefox || true
-          sudo apt-get remove -y powershell || true
-          sudo apt-get remove -y r-base-core || true
-          sudo apt-get autoremove -y
-          sudo apt-get clean
-          echo
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          sudo rm -rfv build || true
-          sudo rm -rf /usr/share/dotnet || true
-          sudo rm -rf /opt/ghc || true
-          sudo rm -rf "/usr/local/share/boost" || true
-          sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
-          df -h
-
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@master
-        with:
-          platforms: all
-
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@master
-
-      - name: Checkout
-        uses: actions/checkout@v4
-
-      - name: Cache GRPC
-        uses: docker/build-push-action@v6
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          build-args: |
-            GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
-          context: .
-          file: ./Dockerfile
-          cache-to: type=gha,ignore-error=true
-          cache-from: type=gha
-          target: grpc
-          platforms: ${{ matrix.platforms }}
-          push: false
--- a/.github/workflows/generate_intel_image.yaml
+++ b/.github/workflows/generate_intel_image.yaml
@@ -12,11 +12,12 @@ concurrency:

 jobs:
  generate_caches:
+    if: github.repository == 'mudler/LocalAI'
    strategy:
      matrix:
        include:
-          - base-image: intel/oneapi-basekit:2025.0.0-0-devel-ubuntu22.04
-            runs-on: 'ubuntu-latest'
+          - base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
+            runs-on: 'arc-runner-set'
            platforms: 'linux/amd64'
    runs-on: ${{matrix.runs-on}}
    steps:
@@ -26,14 +27,14 @@ jobs:
          platforms: all
      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v3
+        uses: docker/login-action@v4
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_PASSWORD }}

      - name: Login to quay
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v3
+        uses: docker/login-action@v4
        with:
          registry: quay.io
          username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
@@ -43,17 +44,17 @@ jobs:
        uses: docker/setup-buildx-action@master

      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6

      - name: Cache Intel images
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@v7
        with:
          builder: ${{ steps.buildx.outputs.name }}
          build-args: |
            BASE_IMAGE=${{ matrix.base-image }}
          context: .
          file: ./Dockerfile
-          tags: quay.io/go-skynet/intel-oneapi-base:latest
+          tags: quay.io/go-skynet/intel-oneapi-base:24.04
          push: true
          target: intel
          platforms: ${{ matrix.platforms }}
--- a/.github/workflows/gh-pages.yml
+++ b/.github/workflows/gh-pages.yml
@@ -0,0 +1,75 @@
+name: Deploy docs to GitHub Pages
+
+on:
+  push:
+    branches:
+      - master
+    paths:
+      - 'docs/**'
+      - 'gallery/**'
+      - 'images/**'
+      - '.github/ci/modelslist.go'
+      - '.github/workflows/gh-pages.yml'
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: false
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    env:
+      HUGO_VERSION: "0.146.3"
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0  # needed for enableGitInfo
+          submodules: true
+
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.22'
+          cache: false
+
+      - name: Setup Hugo
+        uses: peaceiris/actions-hugo@v3
+        with:
+          hugo-version: ${{ env.HUGO_VERSION }}
+          extended: true
+
+      - name: Setup Pages
+        id: pages
+        uses: actions/configure-pages@v6
+
+      - name: Generate gallery
+        run: go run ./.github/ci/modelslist.go ./gallery/index.yaml > docs/static/gallery.html
+
+      - name: Build site
+        working-directory: docs
+        run: |
+          mkdir -p layouts/_default
+          hugo --minify --baseURL "${{ steps.pages.outputs.base_url }}/"
+
+      - name: Upload artifact
+        uses: actions/upload-pages-artifact@v5
+        with:
+          path: docs/public
+
+  deploy:
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    runs-on: ubuntu-latest
+    needs: build
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v5
--- a/.github/workflows/image-pr.yml
+++ b/.github/workflows/image-pr.yml
@@ -1,140 +1,92 @@
 ---
-name: 'build container images tests'
-
-on:
-  pull_request:
-
-concurrency:
-  group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  extras-image-build:
-    uses: ./.github/workflows/image_build.yml
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      ffmpeg: ${{ matrix.ffmpeg }}
-      image-type: ${{ matrix.image-type }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      base-image: ${{ matrix.base-image }}
-      grpc-base-image: ${{ matrix.grpc-base-image }}
-      makeflags: ${{ matrix.makeflags }}
-    secrets:
-      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      # Pushing with all jobs in parallel
-      # eats the bandwidth of all the nodes
-      max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
-      matrix:
-        include:
-          # This is basically covered by the AIO test
-          # - build-type: ''
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   tag-suffix: '-ffmpeg'
-          #   ffmpeg: 'true'
-          #   image-type: 'extras'
-          #   runs-on: 'arc-runner-set'
-          #   base-image: "ubuntu:22.04"
-          #   makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda12-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=3 --output-sync=target"
-          # - build-type: 'hipblas'
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   tag-suffix: '-hipblas'
-          #   ffmpeg: 'false'
-          #   image-type: 'extras'
-          #   base-image: "rocm/dev-ubuntu-22.04:6.1"
-          #   grpc-base-image: "ubuntu:22.04"
-          #   runs-on: 'arc-runner-set'
-          #   makeflags: "--jobs=3 --output-sync=target"
-          # - build-type: 'sycl_f16'
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-          #   grpc-base-image: "ubuntu:22.04"
-          #   tag-suffix: 'sycl-f16-ffmpeg'
-          #   ffmpeg: 'true'
-          #   image-type: 'extras'
-          #   runs-on: 'arc-runner-set'
-          #   makeflags: "--jobs=3 --output-sync=target"
-  # core-image-build:
-  #   uses: ./.github/workflows/image_build.yml
-  #   with:
-  #     tag-latest: ${{ matrix.tag-latest }}
-  #     tag-suffix: ${{ matrix.tag-suffix }}
-  #     ffmpeg: ${{ matrix.ffmpeg }}
-  #     image-type: ${{ matrix.image-type }}
-  #     build-type: ${{ matrix.build-type }}
-  #     cuda-major-version: ${{ matrix.cuda-major-version }}
-  #     cuda-minor-version: ${{ matrix.cuda-minor-version }}
-  #     platforms: ${{ matrix.platforms }}
-  #     runs-on: ${{ matrix.runs-on }}
-  #     base-image: ${{ matrix.base-image }}
-  #     grpc-base-image: ${{ matrix.grpc-base-image }}
-  #     makeflags: ${{ matrix.makeflags }}
-  #   secrets:
-  #     dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-  #     dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-  #     quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-  #     quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-  #   strategy:
-  #     matrix:
-  #       include:
-          # - build-type: ''
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   tag-suffix: '-ffmpeg-core'
-          #   ffmpeg: 'true'
-          #   image-type: 'core'
-          #   runs-on: 'ubuntu-latest'
-          #   base-image: "ubuntu:22.04"
-          #   makeflags: "--jobs=4 --output-sync=target"
-          # - build-type: 'sycl_f16'
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-          #   grpc-base-image: "ubuntu:22.04"
-          #   tag-suffix: 'sycl-f16-ffmpeg-core'
-          #   ffmpeg: 'true'
-          #   image-type: 'core'
-          #   runs-on: 'arc-runner-set'
-          #   makeflags: "--jobs=3 --output-sync=target"
-          # - build-type: 'cublas'
-          #   cuda-major-version: "12"
-          #   cuda-minor-version: "0"
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   tag-suffix: '-cublas-cuda12-ffmpeg-core'
-          #   ffmpeg: 'true'
-          #   image-type: 'core'
-          #   runs-on: 'ubuntu-latest'
-          #   base-image: "ubuntu:22.04"
-          #   makeflags: "--jobs=4 --output-sync=target"
-          # - build-type: 'vulkan'
-          #   platforms: 'linux/amd64'
-          #   tag-latest: 'false'
-          #   tag-suffix: '-vulkan-ffmpeg-core'
-          #   ffmpeg: 'true'
-          #   image-type: 'core'
-          #   runs-on: 'ubuntu-latest'
-          #   base-image: "ubuntu:22.04"
-          #   makeflags: "--jobs=4 --output-sync=target"
+  name: 'build container images tests'
+  
+  on:
+    pull_request:
+  
+  concurrency:
+    group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
+    cancel-in-progress: true
+  
+  jobs:
+    image-build:
+      uses: ./.github/workflows/image_build.yml
+      with:
+        tag-latest: ${{ matrix.tag-latest }}
+        tag-suffix: ${{ matrix.tag-suffix }}
+        build-type: ${{ matrix.build-type }}
+        cuda-major-version: ${{ matrix.cuda-major-version }}
+        cuda-minor-version: ${{ matrix.cuda-minor-version }}
+        platforms: ${{ matrix.platforms }}
+        runs-on: ${{ matrix.runs-on }}
+        base-image: ${{ matrix.base-image }}
+        makeflags: ${{ matrix.makeflags }}
+        ubuntu-version: ${{ matrix.ubuntu-version }}
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+      strategy:
+        # Pushing with all jobs in parallel
+        # eats the bandwidth of all the nodes
+        max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
+        fail-fast: false
+        matrix:
+          include:
+            - build-type: 'cublas'
+              cuda-major-version: "12"
+              cuda-minor-version: "8"
+              platforms: 'linux/amd64'
+              tag-latest: 'false'
+              tag-suffix: '-gpu-nvidia-cuda-12'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:24.04"
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+            - build-type: 'cublas'
+              cuda-major-version: "13"
+              cuda-minor-version: "0"
+              platforms: 'linux/amd64'
+              tag-latest: 'false'
+              tag-suffix: '-gpu-nvidia-cuda-13'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:22.04"
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+            - build-type: 'hipblas'
+              platforms: 'linux/amd64'
+              tag-latest: 'false'
+              tag-suffix: '-hipblas'
+              base-image: "rocm/dev-ubuntu-24.04:7.2.1"
+              runs-on: 'ubuntu-latest'
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+            - build-type: 'sycl'
+              platforms: 'linux/amd64'
+              tag-latest: 'false'
+              base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
+              tag-suffix: 'sycl'
+              runs-on: 'ubuntu-latest'
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+            - build-type: 'vulkan'
+              platforms: 'linux/amd64,linux/arm64'
+              tag-latest: 'false'
+              tag-suffix: '-vulkan-core'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:24.04"
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
+            - build-type: 'cublas'
+              cuda-major-version: "13"
+              cuda-minor-version: "0"
+              platforms: 'linux/arm64'
+              tag-latest: 'false'
+              tag-suffix: '-nvidia-l4t-arm64-cuda-13'
+              base-image: "ubuntu:24.04"
+              runs-on: 'ubuntu-24.04-arm'
+              makeflags: "--jobs=4 --output-sync=target"
+              skip-drivers: 'false'
+              ubuntu-version: '2404'
+  
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@@ -1,404 +1,176 @@
 ---
-name: 'build container images'
+  name: 'build container images'
+  
+  on:
+    push:
+      branches:
+        - master
+      tags:
+        - '*'
+  
+  concurrency:
+    group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
+    cancel-in-progress: true
+  
+  jobs:
+    hipblas-jobs:
+      if: github.repository == 'mudler/LocalAI'
+      uses: ./.github/workflows/image_build.yml
+      with:
+        tag-latest: ${{ matrix.tag-latest }}
+        tag-suffix: ${{ matrix.tag-suffix }}
+        build-type: ${{ matrix.build-type }}
+        cuda-major-version: ${{ matrix.cuda-major-version }}
+        cuda-minor-version: ${{ matrix.cuda-minor-version }}
+        platforms: ${{ matrix.platforms }}
+        runs-on: ${{ matrix.runs-on }}
+        base-image: ${{ matrix.base-image }}
+        makeflags: ${{ matrix.makeflags }}
+        ubuntu-version: ${{ matrix.ubuntu-version }}
+        ubuntu-codename: ${{ matrix.ubuntu-codename }}
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+      strategy:
+        matrix:
+          include:
+            - build-type: 'hipblas'
+              platforms: 'linux/amd64'
+              tag-latest: 'auto'
+              tag-suffix: '-gpu-hipblas'
+              base-image: "rocm/dev-ubuntu-24.04:7.2.1"
+              runs-on: 'ubuntu-latest'
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'

-on:
-  push:
-    branches:
-      - master
-    tags:
-      - '*'
-
-concurrency:
-  group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  hipblas-jobs:
-    uses: ./.github/workflows/image_build.yml
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      ffmpeg: ${{ matrix.ffmpeg }}
-      image-type: ${{ matrix.image-type }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      base-image: ${{ matrix.base-image }}
-      grpc-base-image: ${{ matrix.grpc-base-image }}
-      aio: ${{ matrix.aio }}
-      makeflags: ${{ matrix.makeflags }}
-      latest-image: ${{ matrix.latest-image }}
-      latest-image-aio: ${{ matrix.latest-image-aio }}
-    secrets:
-      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      # Pushing with all jobs in parallel
-      # eats the bandwidth of all the nodes
-      max-parallel: 2
-      matrix:
-        include:
-          - build-type: 'hipblas'
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-hipblas-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            aio: "-aio-gpu-hipblas"
-            base-image: "rocm/dev-ubuntu-22.04:6.1"
-            grpc-base-image: "ubuntu:22.04"
-            latest-image: 'latest-gpu-hipblas'
-            latest-image-aio: 'latest-aio-gpu-hipblas'
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'hipblas'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-hipblas'
-            ffmpeg: 'false'
-            image-type: 'extras'
-            base-image: "rocm/dev-ubuntu-22.04:6.1"
-            grpc-base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'hipblas'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-hipblas-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            base-image: "rocm/dev-ubuntu-22.04:6.1"
-            grpc-base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'hipblas'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-hipblas-core'
-            ffmpeg: 'false'
-            image-type: 'core'
-            base-image: "rocm/dev-ubuntu-22.04:6.1"
-            grpc-base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-  self-hosted-jobs:
-    uses: ./.github/workflows/image_build.yml
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      ffmpeg: ${{ matrix.ffmpeg }}
-      image-type: ${{ matrix.image-type }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      base-image: ${{ matrix.base-image }}
-      grpc-base-image: ${{ matrix.grpc-base-image }}
-      aio: ${{ matrix.aio }}
-      makeflags: ${{ matrix.makeflags }}
-      latest-image: ${{ matrix.latest-image }}
-      latest-image-aio: ${{ matrix.latest-image-aio }}
-    secrets:
-      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      # Pushing with all jobs in parallel
-      # eats the bandwidth of all the nodes
-      max-parallel: ${{ github.event_name != 'pull_request' && 5 || 8 }}
-      matrix:
-        include:
-          # Extra images
-          - build-type: ''
-            #platforms: 'linux/amd64,linux/arm64'
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: ''
-            ffmpeg: ''
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: ''
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'cublas'
-            cuda-major-version: "11"
-            cuda-minor-version: "7"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda11'
-            ffmpeg: ''
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda12'
-            ffmpeg: ''
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'cublas'
-            cuda-major-version: "11"
-            cuda-minor-version: "7"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cublas-cuda11-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            aio: "-aio-gpu-nvidia-cuda-11"
-            latest-image: 'latest-gpu-nvidia-cuda-11'
-            latest-image-aio: 'latest-aio-gpu-nvidia-cuda-11'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: '-cublas-cuda12-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            aio: "-aio-gpu-nvidia-cuda-12"
-            latest-image: 'latest-gpu-nvidia-cuda-12'
-            latest-image-aio: 'latest-aio-gpu-nvidia-cuda-12'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: ''
-            #platforms: 'linux/amd64,linux/arm64'
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            tag-suffix: ''
-            ffmpeg: ''
-            image-type: 'extras'
-            base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'sycl_f16'
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f16-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            aio: "-aio-gpu-intel-f16"
-            latest-image: 'latest-gpu-intel-f16'
-            latest-image-aio: 'latest-aio-gpu-intel-f16'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'sycl_f32'
-            platforms: 'linux/amd64'
-            tag-latest: 'auto'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f32-ffmpeg'
-            ffmpeg: 'true'
-            image-type: 'extras'
-            runs-on: 'arc-runner-set'
-            aio: "-aio-gpu-intel-f32"
-            latest-image: 'latest-gpu-intel-f32'
-            latest-image-aio: 'latest-aio-gpu-intel-f32'
-            makeflags: "--jobs=3 --output-sync=target"
-          # Core images
-          - build-type: 'sycl_f16'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f16-core'
-            ffmpeg: 'false'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'sycl_f32'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f32-core'
-            ffmpeg: 'false'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'sycl_f16'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f16-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-          - build-type: 'sycl_f32'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            base-image: "quay.io/go-skynet/intel-oneapi-base:latest"
-            grpc-base-image: "ubuntu:22.04"
-            tag-suffix: '-sycl-f32-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=3 --output-sync=target"
-
-  core-image-build:
-    uses: ./.github/workflows/image_build.yml
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      ffmpeg: ${{ matrix.ffmpeg }}
-      image-type: ${{ matrix.image-type }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      aio: ${{ matrix.aio }}
-      base-image: ${{ matrix.base-image }}
-      grpc-base-image: ${{ matrix.grpc-base-image }}
-      makeflags: ${{ matrix.makeflags }}
-      latest-image: ${{ matrix.latest-image }}
-      latest-image-aio: ${{ matrix.latest-image-aio }}
-      skip-drivers: ${{ matrix.skip-drivers }}
-    secrets:
-      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
-      matrix:
-        include:
-          - build-type: ''
-            platforms: 'linux/amd64,linux/arm64'
-            tag-latest: 'auto'
-            tag-suffix: '-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            aio: "-aio-cpu"
-            latest-image: 'latest-cpu'
-            latest-image-aio: 'latest-aio-cpu'
-            makeflags: "--jobs=4 --output-sync=target"
-            skip-drivers: 'false'
-          - build-type: 'cublas'
-            cuda-major-version: "11"
-            cuda-minor-version: "7"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda11-core'
-            ffmpeg: ''
-            image-type: 'core'
-            base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=4 --output-sync=target"
-            skip-drivers: 'false'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda12-core'
-            ffmpeg: ''
-            image-type: 'core'
-            base-image: "ubuntu:22.04"
-            runs-on: 'arc-runner-set'
-            makeflags: "--jobs=4 --output-sync=target"
-            skip-drivers: 'false'
-          - build-type: 'cublas'
-            cuda-major-version: "11"
-            cuda-minor-version: "7"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda11-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            makeflags: "--jobs=4 --output-sync=target"
-            skip-drivers: 'false'
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-cublas-cuda12-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            skip-drivers: 'false'
-            makeflags: "--jobs=4 --output-sync=target"
-          - build-type: 'vulkan'
-            platforms: 'linux/amd64'
-            tag-latest: 'false'
-            tag-suffix: '-vulkan-ffmpeg-core'
-            latest-image: 'latest-vulkan-ffmpeg-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            runs-on: 'arc-runner-set'
-            base-image: "ubuntu:22.04"
-            skip-drivers: 'false'
-            makeflags: "--jobs=4 --output-sync=target"
-  gh-runner:
-    uses: ./.github/workflows/image_build.yml
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      ffmpeg: ${{ matrix.ffmpeg }}
-      image-type: ${{ matrix.image-type }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      aio: ${{ matrix.aio }}
-      base-image: ${{ matrix.base-image }}
-      grpc-base-image: ${{ matrix.grpc-base-image }}
-      makeflags: ${{ matrix.makeflags }}
-      latest-image: ${{ matrix.latest-image }}
-      latest-image-aio: ${{ matrix.latest-image-aio }}
-      skip-drivers: ${{ matrix.skip-drivers }}
-    secrets:
-      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      matrix:
-        include:
-          - build-type: 'cublas'
-            cuda-major-version: "12"
-            cuda-minor-version: "0"
-            platforms: 'linux/arm64'
-            tag-latest: 'false'
-            tag-suffix: '-nvidia-l4t-arm64-core'
-            latest-image: 'latest-nvidia-l4t-arm64-core'
-            ffmpeg: 'true'
-            image-type: 'core'
-            base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-            runs-on: 'ubuntu-24.04-arm'
-            makeflags: "--jobs=4 --output-sync=target"
-            skip-drivers: 'true'
+    core-image-build:
+      if: github.repository == 'mudler/LocalAI'
+      uses: ./.github/workflows/image_build.yml
+      with:
+        tag-latest: ${{ matrix.tag-latest }}
+        tag-suffix: ${{ matrix.tag-suffix }}
+        build-type: ${{ matrix.build-type }}
+        cuda-major-version: ${{ matrix.cuda-major-version }}
+        cuda-minor-version: ${{ matrix.cuda-minor-version }}
+        platforms: ${{ matrix.platforms }}
+        runs-on: ${{ matrix.runs-on }}
+        base-image: ${{ matrix.base-image }}
+        makeflags: ${{ matrix.makeflags }}
+        skip-drivers: ${{ matrix.skip-drivers }}
+        ubuntu-version: ${{ matrix.ubuntu-version }}
+        ubuntu-codename: ${{ matrix.ubuntu-codename }}
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+      strategy:
+        #max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
+        matrix:
+          include:
+            - build-type: ''
+              platforms: 'linux/amd64,linux/arm64'
+              tag-latest: 'auto'
+              tag-suffix: ''
+              base-image: "ubuntu:24.04"
+              runs-on: 'ubuntu-latest'
+              makeflags: "--jobs=4 --output-sync=target"
+              skip-drivers: 'false'
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+            - build-type: 'cublas'
+              cuda-major-version: "12"
+              cuda-minor-version: "8"
+              platforms: 'linux/amd64'
+              tag-latest: 'auto'
+              tag-suffix: '-gpu-nvidia-cuda-12'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:24.04"
+              skip-drivers: 'false'
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+            - build-type: 'cublas'
+              cuda-major-version: "13"
+              cuda-minor-version: "0"
+              platforms: 'linux/amd64'
+              tag-latest: 'auto'
+              tag-suffix: '-gpu-nvidia-cuda-13'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:22.04"
+              skip-drivers: 'false'
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+            - build-type: 'vulkan'
+              platforms: 'linux/amd64,linux/arm64'
+              tag-latest: 'auto'
+              tag-suffix: '-gpu-vulkan'
+              runs-on: 'ubuntu-latest'
+              base-image: "ubuntu:24.04"
+              skip-drivers: 'false'
+              makeflags: "--jobs=4 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+            - build-type: 'intel'
+              platforms: 'linux/amd64'
+              tag-latest: 'auto'
+              base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
+              tag-suffix: '-gpu-intel'
+              runs-on: 'ubuntu-latest'
+              makeflags: "--jobs=3 --output-sync=target"
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+  
+    gh-runner:
+      if: github.repository == 'mudler/LocalAI'
+      uses: ./.github/workflows/image_build.yml
+      with:
+        tag-latest: ${{ matrix.tag-latest }}
+        tag-suffix: ${{ matrix.tag-suffix }}
+        build-type: ${{ matrix.build-type }}
+        cuda-major-version: ${{ matrix.cuda-major-version }}
+        cuda-minor-version: ${{ matrix.cuda-minor-version }}
+        platforms: ${{ matrix.platforms }}
+        runs-on: ${{ matrix.runs-on }}
+        base-image: ${{ matrix.base-image }}
+        makeflags: ${{ matrix.makeflags }}
+        skip-drivers: ${{ matrix.skip-drivers }}
+        ubuntu-version: ${{ matrix.ubuntu-version }}
+        ubuntu-codename: ${{ matrix.ubuntu-codename }}
+      secrets:
+        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+      strategy:
+        matrix:
+          include:
+            - build-type: 'cublas'
+              cuda-major-version: "12"
+              cuda-minor-version: "0"
+              platforms: 'linux/arm64'
+              tag-latest: 'auto'
+              tag-suffix: '-nvidia-l4t-arm64'
+              base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
+              runs-on: 'ubuntu-24.04-arm'
+              makeflags: "--jobs=4 --output-sync=target"
+              skip-drivers: 'true'
+              ubuntu-version: "2204"
+              ubuntu-codename: 'jammy'
+            - build-type: 'cublas'
+              cuda-major-version: "13"
+              cuda-minor-version: "0"
+              platforms: 'linux/arm64'
+              tag-latest: 'auto'
+              tag-suffix: '-nvidia-l4t-arm64-cuda-13'
+              base-image: "ubuntu:24.04"
+              runs-on: 'ubuntu-24.04-arm'
+              makeflags: "--jobs=4 --output-sync=target"
+              skip-drivers: 'false'
+              ubuntu-version: '2404'
+              ubuntu-codename: 'noble'
+  
--- a/.github/workflows/image_build.yml
+++ b/.github/workflows/image_build.yml
@@ -8,11 +8,6 @@ on:
        description: 'Base image'
        required: true
        type: string
-      grpc-base-image:
-        description: 'GRPC Base image, must be a compatible image with base-image'
-        required: false
-        default: ''
-        type: string
      build-type:
        description: 'Build type'
        default: ''
@@ -23,7 +18,7 @@ on:
        type: string
      cuda-minor-version:
        description: 'CUDA minor version'
-        default: "4"
+        default: "9"
        type: string
      platforms:
        description: 'Platforms'
@@ -33,30 +28,14 @@ on:
        description: 'Tag latest'
        default: ''
        type: string
-      latest-image:
-          description: 'Tag latest'
-          default: ''
-          type: string
-      latest-image-aio:
-          description: 'Tag latest'
-          default: ''
-          type: string
      tag-suffix:
        description: 'Tag suffix'
        default: ''
        type: string
-      ffmpeg:
-        description: 'FFMPEG'
-        default: ''
-        type: string
      skip-drivers:
        description: 'Skip drivers by default'
        default: 'false'
        type: string
-      image-type:
-        description: 'Image type'
-        default: ''
-        type: string
      runs-on:
        description: 'Runs on'
        required: true
@@ -67,10 +46,15 @@ on:
        required: false
        default: '--jobs=4 --output-sync=target'
        type: string
-      aio:
-        description: 'AIO Image Name'
+      ubuntu-version:
+        description: 'Ubuntu version'
        required: false
-        default: ''
+        default: '2204'
+        type: string
+      ubuntu-codename:
+        description: 'Ubuntu codename'
+        required: false
+        default: 'noble'
        type: string
    secrets:
      dockerUsername:
@@ -85,16 +69,29 @@ jobs:
  reusable_image-build:
    runs-on: ${{ inputs.runs-on }}
    steps:
-      - name: Force Install GIT latest
-        run: |
-          sudo apt-get update \
-          && sudo apt-get install -y software-properties-common \
-          && sudo apt-get update \
-          && sudo add-apt-repository -y ppa:git-core/ppa \
-          && sudo apt-get update \
-          && sudo apt-get install -y git
+
      - name: Checkout
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
+
+      - name: Configure apt mirror on runner
+        id: apt_mirror
+        uses: ./.github/actions/configure-apt-mirror
+
+      - name: Free Disk Space (Ubuntu)
+        if: inputs.runs-on == 'ubuntu-latest'
+        uses: jlumbroso/free-disk-space@main
+        with:
+          # this might remove tools that are actually needed,
+          # if set to "true" but frees about 6 GB
+          tool-cache: true
+          # all of these default to true, but feel free to set to
+          # "false" if necessary for your workflow
+          android: true
+          dotnet: true
+          haskell: true
+          large-packages: true
+          docker-images: true
+          swap-storage: true

      - name: Release space from worker
        if: inputs.runs-on == 'ubuntu-latest'
@@ -106,8 +103,8 @@ jobs:
          df -h
          echo
          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
+          sudo apt-get remove --auto-remove android-sdk-platform-tools snapd || true
+          sudo apt-get purge --auto-remove android-sdk-platform-tools snapd || true
          sudo rm -rf /usr/local/lib/android
          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
          sudo rm -rf /usr/share/dotnet
@@ -141,7 +138,7 @@ jobs:
      - name: Docker meta
        id: meta
        if: github.event_name != 'pull_request'
-        uses: docker/metadata-action@v5
+        uses: docker/metadata-action@v6
        with:
          images: |
            quay.io/go-skynet/local-ai
@@ -152,48 +149,21 @@ jobs:
            type=sha
          flavor: |
            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.tag-suffix }}
+            suffix=${{ inputs.tag-suffix }},onlatest=true
      - name: Docker meta for PR
        id: meta_pull_request
        if: github.event_name == 'pull_request'
-        uses: docker/metadata-action@v5
+        uses: docker/metadata-action@v6
        with:
          images: |
-            ttl.sh/localai-ci-pr-${{ github.event.number }}
+            quay.io/go-skynet/ci-tests
          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-            type=sha
+            type=ref,event=branch,suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+            type=semver,pattern={{raw}},suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+            type=sha,suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
          flavor: |
            latest=${{ inputs.tag-latest }}
            suffix=${{ inputs.tag-suffix }}
-      - name: Docker meta AIO (quay.io)
-        if: inputs.aio != ''
-        id: meta_aio
-        uses: docker/metadata-action@v5
-        with:
-          images: |
-            quay.io/go-skynet/local-ai
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-          flavor: |
-            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.aio }}
-
-      - name: Docker meta AIO (dockerhub)
-        if: inputs.aio != ''
-        id: meta_aio_dockerhub
-        uses: docker/metadata-action@v5
-        with:
-          images: |
-            localai/localai
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-          flavor: |
-            suffix=${{ inputs.aio }}
-
      - name: Set up QEMU
        uses: docker/setup-qemu-action@master
        with:
@@ -205,137 +175,68 @@ jobs:

      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v3
+        uses: docker/login-action@v4
        with:
          username: ${{ secrets.dockerUsername }}
          password: ${{ secrets.dockerPassword }}

      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v3
+        uses: docker/login-action@v4
        with:
          registry: quay.io
          username: ${{ secrets.quayUsername }}
          password: ${{ secrets.quayPassword }}

      - name: Build and push
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@v7
        if: github.event_name != 'pull_request'
        with:
          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          # This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
          build-args: |
            BUILD_TYPE=${{ inputs.build-type }}
            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
-            FFMPEG=${{ inputs.ffmpeg }}
-            IMAGE_TYPE=${{ inputs.image-type }}
            BASE_IMAGE=${{ inputs.base-image }}
-            GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
            MAKEFLAGS=${{ inputs.makeflags }}
            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
          context: .
          file: ./Dockerfile
-          cache-from: type=gha
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
+          cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
          platforms: ${{ inputs.platforms }}
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
 ### Start testing image
      - name: Build and push
-        uses: docker/build-push-action@v6
+        uses: docker/build-push-action@v7
        if: github.event_name == 'pull_request'
        with:
          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          # This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
          build-args: |
            BUILD_TYPE=${{ inputs.build-type }}
            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
-            FFMPEG=${{ inputs.ffmpeg }}
-            IMAGE_TYPE=${{ inputs.image-type }}
            BASE_IMAGE=${{ inputs.base-image }}
-            GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
            MAKEFLAGS=${{ inputs.makeflags }}
            SKIP_DRIVERS=${{ inputs.skip-drivers }}
+            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
+            UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
+            APT_MIRROR=${{ steps.apt_mirror.outputs.effective-mirror }}
+            APT_PORTS_MIRROR=${{ steps.apt_mirror.outputs.effective-ports-mirror }}
          context: .
          file: ./Dockerfile
-          cache-from: type=gha
+          cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
          platforms: ${{ inputs.platforms }}
-          push: true
+          #push: true
          tags: ${{ steps.meta_pull_request.outputs.tags }}
          labels: ${{ steps.meta_pull_request.outputs.labels }}
-      - name: Testing image
-        if: github.event_name == 'pull_request'
-        run: |
-          echo "Image is available at ttl.sh/localai-ci-pr-${{ github.event.number }}:${{ steps.meta_pull_request.outputs.version }}" >> $GITHUB_STEP_SUMMARY
 ## End testing image
-      - name: Build and push AIO image
-        if: inputs.aio != ''
-        uses: docker/build-push-action@v6
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BASE_IMAGE=quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
-            MAKEFLAGS=${{ inputs.makeflags }}
-          context: .
-          file: ./Dockerfile.aio
-          platforms: ${{ inputs.platforms }}
-          push: ${{ github.event_name != 'pull_request' }}
-          tags: ${{ steps.meta_aio.outputs.tags }}
-          labels: ${{ steps.meta_aio.outputs.labels }}
-
-      - name: Build and push AIO image (dockerhub)
-        if: inputs.aio != ''
-        uses: docker/build-push-action@v6
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BASE_IMAGE=localai/localai:${{ steps.meta.outputs.version }}
-            MAKEFLAGS=${{ inputs.makeflags }}
-          context: .
-          file: ./Dockerfile.aio
-          platforms: ${{ inputs.platforms }}
-          push: ${{ github.event_name != 'pull_request' }}
-          tags: ${{ steps.meta_aio_dockerhub.outputs.tags }}
-          labels: ${{ steps.meta_aio_dockerhub.outputs.labels }}
-
-      - name: Latest tag
-        # run this on branches, when it is a tag and there is a latest-image defined
-        if: github.event_name != 'pull_request' && inputs.latest-image != ''  && github.ref_type == 'tag'
-        run: |
-          docker pull localai/localai:${{ steps.meta.outputs.version }}
-          docker tag localai/localai:${{ steps.meta.outputs.version }} localai/localai:${{ inputs.latest-image }}
-          docker push localai/localai:${{ inputs.latest-image }}
-          docker pull quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
-          docker tag quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }} quay.io/go-skynet/local-ai:${{ inputs.latest-image }}
-          docker push quay.io/go-skynet/local-ai:${{ inputs.latest-image }}
-      - name: Latest AIO tag
-        # run this on branches, when it is a tag and there is a latest-image defined
-        if: github.event_name != 'pull_request' && inputs.latest-image-aio != ''  && github.ref_type == 'tag'
-        run: |
-          docker pull localai/localai:${{ steps.meta_aio_dockerhub.outputs.version }}
-          docker tag localai/localai:${{ steps.meta_aio_dockerhub.outputs.version }} localai/localai:${{ inputs.latest-image-aio }}
-          docker push localai/localai:${{ inputs.latest-image-aio }}
-          docker pull quay.io/go-skynet/local-ai:${{ steps.meta_aio.outputs.version }}
-          docker tag quay.io/go-skynet/local-ai:${{ steps.meta_aio.outputs.version }} quay.io/go-skynet/local-ai:${{ inputs.latest-image-aio }}
-          docker push quay.io/go-skynet/local-ai:${{ inputs.latest-image-aio }}
-
      - name: job summary
        run: |
          echo "Built image: ${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
-
-      - name: job summary(AIO)
-        if: inputs.aio != ''
-        run: |
-          echo "Built image: ${{ steps.meta_aio.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/lint.yml
+++ b/.github/workflows/lint.yml
@@ -0,0 +1,48 @@
+---
+name: 'lint'
+
+on:
+  pull_request:
+    paths-ignore:
+      - 'docs/**'
+      - 'examples/**'
+      - 'README.md'
+      - '**/*.md'
+  push:
+    branches:
+      - master
+
+concurrency:
+  group: ci-lint-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  golangci-lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          # Full history so golangci-lint's new-from-merge-base can reach
+          # origin/master and compute the diff against it.
+          fetch-depth: 0
+      - uses: actions/setup-go@v5
+        with:
+          go-version: '1.26.x'
+          cache: false
+      - name: install golangci-lint
+        run: |
+          curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh \
+            | sh -s -- -b "$(go env GOPATH)/bin" v2.11.4
+      - name: generate grpc proto sources
+        # pkg/grpc/proto/*.go is generated, not checked in. Several packages
+        # import it, so without this step typecheck fails project-wide.
+        run: make protogen-go
+      - name: stub react-ui dist for go:embed
+        # core/http/app.go has //go:embed react-ui/dist/*; the glob needs at
+        # least one non-hidden entry to satisfy typecheck. We don't run
+        # `make react-ui` here because lint doesn't need the real bundle.
+        run: |
+          mkdir -p core/http/react-ui/dist
+          touch core/http/react-ui/dist/index.html
+      - name: lint
+        run: make lint
--- a/.github/workflows/notify-releases.yaml
+++ b/.github/workflows/notify-releases.yaml
@@ -6,15 +6,17 @@ on:

 jobs:
  notify-discord:
+    if: github.repository == 'mudler/LocalAI'
    runs-on: ubuntu-latest
    env:
        RELEASE_BODY: ${{ github.event.release.body }}
        RELEASE_TITLE: ${{ github.event.release.name }}
        RELEASE_TAG_NAME: ${{ github.event.release.tag_name }}
+        MODEL_NAME: gemma-3-12b-it-qat
    steps:
    - uses: mudler/localai-github-action@v1
      with:
-        model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
+        model: 'gemma-3-12b-it-qat' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
    - name: Summarize
      id: summarize
      run: |
@@ -60,4 +62,4 @@ jobs:
        DISCORD_AVATAR: "https://avatars.githubusercontent.com/u/139863280?v=4"
      uses: Ilshidur/action-discord@master
      with:
-        args: ${{ steps.summarize.outputs.message }}
+        args: ${{ steps.summarize.outputs.message }}
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -1,324 +1,66 @@
-name: Build and Release
+name: goreleaser

 on:
  push:
-    branches:
-      - master
    tags:
      - 'v*'
-  pull_request:
-
-env:
-  GRPC_VERSION: v1.65.0
-
-permissions:
-  contents: write
-
-concurrency:
-  group: ci-releases-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true

 jobs:
-
-  build-linux-arm:
+  goreleaser:
    runs-on: ubuntu-latest
    steps:
-      - name: Clone
-        uses: actions/checkout@v4
+      - name: Checkout
+        uses: actions/checkout@v6
        with:
-          submodules: true
-      - uses: actions/setup-go@v5
+          fetch-depth: 0
+      - name: Set up Go
+        uses: actions/setup-go@v5
        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk
-          sudo apt-get install -qy binutils-aarch64-linux-gnu gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libgmock-dev
-      - name: Install CUDA Dependencies
-        run: |
-          curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/cross-linux-aarch64/cuda-keyring_1.1-1_all.deb
-          sudo dpkg -i cuda-keyring_1.1-1_all.deb
-          sudo apt-get update
-          sudo apt-get install -y cuda-cross-aarch64 cuda-nvcc-cross-aarch64-${CUDA_VERSION} libcublas-cross-aarch64-${CUDA_VERSION}
+          go-version: 1.23
+      - name: Run GoReleaser
+        uses: goreleaser/goreleaser-action@v7
+        with:
+          version: v2.11.0
+          args: release --clean
        env:
-          CUDA_VERSION: 12-4
-      - name: Cache grpc
-        id: cache-grpc
-        uses: actions/cache@v4
-        with:
-          path: grpc
-          key: ${{ runner.os }}-arm-grpc-${{ env.GRPC_VERSION }}
-      - name: Build grpc
-        if: steps.cache-grpc.outputs.cache-hit != 'true'
-        run: |
-
-          git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-          cd grpc && sed -i "216i\  TESTONLY" "third_party/abseil-cpp/absl/container/CMakeLists.txt" && mkdir -p cmake/build && \
-          cd cmake/build && cmake -DgRPC_INSTALL=ON \
-            -DgRPC_BUILD_TESTS=OFF \
-            ../.. && sudo make --jobs 5 --output-sync=target
-      - name: Install gRPC
-        run: |
-          GNU_HOST=aarch64-linux-gnu
-          C_COMPILER_ARM_LINUX=$GNU_HOST-gcc
-          CXX_COMPILER_ARM_LINUX=$GNU_HOST-g++
-
-          CROSS_TOOLCHAIN=/usr/$GNU_HOST
-          CROSS_STAGING_PREFIX=$CROSS_TOOLCHAIN/stage
-          CMAKE_CROSS_TOOLCHAIN=/tmp/arm.toolchain.cmake
-
-          # https://cmake.org/cmake/help/v3.13/manual/cmake-toolchains.7.html#cross-compiling-for-linux
-          echo "set(CMAKE_SYSTEM_NAME Linux)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_SYSTEM_PROCESSOR arm)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_STAGING_PREFIX $CROSS_STAGING_PREFIX)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_SYSROOT ${CROSS_TOOLCHAIN}/sysroot)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_C_COMPILER /usr/bin/$C_COMPILER_ARM_LINUX)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_CXX_COMPILER /usr/bin/$CXX_COMPILER_ARM_LINUX)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)" >> $CMAKE_CROSS_TOOLCHAIN && \
-            echo "set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)" >> $CMAKE_CROSS_TOOLCHAIN
-          GRPC_DIR=$PWD/grpc
-          cd grpc && cd cmake/build && sudo make --jobs 5 --output-sync=target install && \
-          GRPC_CROSS_BUILD_DIR=$GRPC_DIR/cmake/cross_build && \
-          mkdir -p $GRPC_CROSS_BUILD_DIR && \
-          cd $GRPC_CROSS_BUILD_DIR && \
-          cmake -DCMAKE_TOOLCHAIN_FILE=$CMAKE_CROSS_TOOLCHAIN \
-            -DCMAKE_BUILD_TYPE=Release \
-            -DCMAKE_INSTALL_PREFIX=$CROSS_TOOLCHAIN/grpc_install \
-            ../.. && \
-          sudo make -j`nproc` install
-      - name: Build
-        id: build
-        run: |
-          GNU_HOST=aarch64-linux-gnu
-          C_COMPILER_ARM_LINUX=$GNU_HOST-gcc
-          CXX_COMPILER_ARM_LINUX=$GNU_HOST-g++
-
-          CROSS_TOOLCHAIN=/usr/$GNU_HOST
-          CROSS_STAGING_PREFIX=$CROSS_TOOLCHAIN/stage
-          CMAKE_CROSS_TOOLCHAIN=/tmp/arm.toolchain.cmake
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          export PATH=$PATH:$GOPATH/bin
-          export PATH=/usr/local/cuda/bin:$PATH
-          sudo rm -rf /usr/aarch64-linux-gnu/lib/libstdc++.so.6
-          sudo cp -rf /usr/aarch64-linux-gnu/lib/libstdc++.so* /usr/aarch64-linux-gnu/lib/libstdc++.so.6
-          sudo cp /usr/aarch64-linux-gnu/lib/ld-linux-aarch64.so.1 ld.so
-          BACKEND_LIBS="./grpc/cmake/cross_build/third_party/re2/libre2.a ./grpc/cmake/cross_build/libgrpc.a ./grpc/cmake/cross_build/libgrpc++.a ./grpc/cmake/cross_build/third_party/protobuf/libprotobuf.a /usr/aarch64-linux-gnu/lib/libc.so.6 /usr/aarch64-linux-gnu/lib/libstdc++.so.6 /usr/aarch64-linux-gnu/lib/libgomp.so.1 /usr/aarch64-linux-gnu/lib/libm.so.6 /usr/aarch64-linux-gnu/lib/libgcc_s.so.1 /usr/aarch64-linux-gnu/lib/libdl.so.2 /usr/aarch64-linux-gnu/lib/libpthread.so.0 ./ld.so" \
-          GOOS=linux \
-          GOARCH=arm64 \
-          CMAKE_ARGS="-DProtobuf_INCLUDE_DIRS=$CROSS_STAGING_PREFIX/include -DProtobuf_DIR=$CROSS_STAGING_PREFIX/lib/cmake/protobuf -DgRPC_DIR=$CROSS_STAGING_PREFIX/lib/cmake/grpc -DCMAKE_TOOLCHAIN_FILE=$CMAKE_CROSS_TOOLCHAIN -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++" make dist-cross-linux-arm64
-      - uses: actions/upload-artifact@v4
-        with:
-          name: LocalAI-linux-arm64
-          path: release/
-      - name: Release
-        uses: softprops/action-gh-release@v2
-        if: startsWith(github.ref, 'refs/tags/')
-        with:
-          files: |
-            release/*
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
-  build-linux:
-    runs-on: arc-runner-set
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+  launcher-build-darwin:
+    runs-on: macos-latest
    steps:
-      - name: Force Install GIT latest
+      - name: Checkout
+        uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 1.23
+      - name: Build launcher for macOS ARM64
        run: |
-          sudo apt-get update \
-          && sudo apt-get install -y software-properties-common \
-          && sudo apt-get update \
-          && sudo add-apt-repository -y ppa:git-core/ppa \
-          && sudo apt-get update \
-          && sudo apt-get install -y git
-      - name: Clone
-        uses: actions/checkout@v4
+          make build-launcher-darwin
+      - name: Upload DMG to Release
+        uses: softprops/action-gh-release@v3
        with:
-          submodules: true
-      - uses: actions/setup-go@v5
+          files: ./dist/LocalAI.dmg
+  launcher-build-linux:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
+          fetch-depth: 0
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: 1.23
+      - name: Build launcher for Linux
        run: |
          sudo apt-get update
-          sudo apt-get install -y wget curl build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk cmake libgmock-dev
-      - name: Intel Dependencies
-        run: |
-          wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
-          echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
-          sudo apt update
-          sudo apt install -y intel-basekit
-      - name: Install CUDA Dependencies
-        run: |
-          curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
-          sudo dpkg -i cuda-keyring_1.1-1_all.deb
-          sudo apt-get update
-          sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
-        env:
-          CUDA_VERSION: 12-5
-      - name: "Install Hipblas"
-        env:
-          ROCM_VERSION: "6.1"
-          AMDGPU_VERSION: "6.1"
-        run: |
-            set -ex
-
-            sudo apt-get update
-            sudo DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends ca-certificates curl libnuma-dev gnupg
-
-            curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
-
-            printf "deb [arch=amd64] https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
-
-            printf "deb [arch=amd64] https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy main" | sudo tee /etc/apt/sources.list.d/amdgpu.list
-            printf 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
-            sudo apt-get update
-
-            sudo DEBIAN_FRONTEND=noninteractive apt-get install -y \
-                hipblas-dev rocm-dev \
-                rocblas-dev
-
-            sudo apt-get clean
-            sudo rm -rf /var/lib/apt/lists/*
-            sudo ldconfig
-      - name: Cache grpc
-        id: cache-grpc
-        uses: actions/cache@v4
+          sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
+          make build-launcher-linux
+      - name: Upload Linux launcher artifacts
+        uses: softprops/action-gh-release@v3
        with:
-          path: grpc
-          key: ${{ runner.os }}-grpc-${{ env.GRPC_VERSION }}
-      - name: Build grpc
-        if: steps.cache-grpc.outputs.cache-hit != 'true'
-        run: |
-          git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-          cd grpc && sed -i "216i\  TESTONLY" "third_party/abseil-cpp/absl/container/CMakeLists.txt" && mkdir -p cmake/build && \
-          cd cmake/build && cmake -DgRPC_INSTALL=ON \
-            -DgRPC_BUILD_TESTS=OFF \
-            ../.. && sudo make --jobs 5 --output-sync=target
-      - name: Install gRPC
-        run: |
-          cd grpc && cd cmake/build && sudo make --jobs 5 --output-sync=target install
-      # BACKEND_LIBS needed for gpu-workload: /opt/intel/oneapi/*/lib/libiomp5.so /opt/intel/oneapi/*/lib/libmkl_core.so /opt/intel/oneapi/*/lib/libmkl_core.so.2 /opt/intel/oneapi/*/lib/libmkl_intel_ilp64.so /opt/intel/oneapi/*/lib/libmkl_intel_ilp64.so.2 /opt/intel/oneapi/*/lib/libmkl_sycl_blas.so /opt/intel/oneapi/*/lib/libmkl_sycl_blas.so.4 /opt/intel/oneapi/*/lib/libmkl_tbb_thread.so /opt/intel/oneapi/*/lib/libmkl_tbb_thread.so.2 /opt/intel/oneapi/*/lib/libsycl.so /opt/intel/oneapi/*/lib/libsycl.so.7 /opt/intel/oneapi/*/lib/libsycl.so.7.1.0 /opt/rocm-*/lib/libamdhip64.so /opt/rocm-*/lib/libamdhip64.so.5 /opt/rocm-*/lib/libamdhip64.so.6 /opt/rocm-*/lib/libamdhip64.so.6.1.60100 /opt/rocm-*/lib/libhipblas.so /opt/rocm-*/lib/libhipblas.so.2 /opt/rocm-*/lib/libhipblas.so.2.1.60100 /opt/rocm-*/lib/librocblas.so /opt/rocm-*/lib/librocblas.so.4 /opt/rocm-*/lib/librocblas.so.4.1.60100 /usr/lib/x86_64-linux-gnu/libstdc++.so.6 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/x86_64-linux-gnu/libm.so.6 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 /usr/lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/librt.so.1 /usr/local/cuda-*/targets/x86_64-linux/lib/libcublas.so /usr/local/cuda-*/targets/x86_64-linux/lib/libcublasLt.so /usr/local/cuda-*/targets/x86_64-linux/lib/libcudart.so /usr/local/cuda-*/targets/x86_64-linux/lib/stubs/libcuda.so
-      - name: Build
-        id: build
-        run: |
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          export PATH=$PATH:$GOPATH/bin
-          export PATH=/usr/local/cuda/bin:$PATH
-          export PATH=/opt/rocm/bin:$PATH
-          source /opt/intel/oneapi/setvars.sh
-          sudo cp /lib64/ld-linux-x86-64.so.2 ld.so
-          BACKEND_LIBS="./ld.so ./sources/go-piper/piper/build/fi/lib/libfmt.a ./sources/go-piper/piper-phonemize/pi/lib/libonnxruntime.so.1.14.1 ./sources/go-piper/piper-phonemize/pi/src/libespeak-ng/libespeak-ng.so /usr/lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/librt.so.1 /usr/lib/x86_64-linux-gnu/libpthread.so.0 ./sources/go-piper/piper-phonemize/pi/lib/libpiper_phonemize.so.1 ./sources/go-piper/piper/build/si/lib/libspdlog.a ./sources/go-piper/espeak/ei/lib/libucd.so" \
-          make -j4 dist
-      - uses: actions/upload-artifact@v4
-        with:
-          name: LocalAI-linux
-          path: release/
-      - name: Release
-        uses: softprops/action-gh-release@v2
-        if: startsWith(github.ref, 'refs/tags/')
-        with:
-          files: |
-            release/*
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
-
-
-  build-macOS-x86_64:
-    runs-on: macos-13
-    steps:
-      - name: Clone
-        uses: actions/checkout@v4
-        with:
-          submodules: true
-      - uses: actions/setup-go@v5
-        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
-        run: |
-          brew install protobuf grpc
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@8ba23be9613c672d40ae261d2a1335d639bdd59b
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.0
-      - name: Build
-        id: build
-        run: |
-          export C_INCLUDE_PATH=/usr/local/include
-          export CPLUS_INCLUDE_PATH=/usr/local/include
-          export PATH=$PATH:$GOPATH/bin
-          export SKIP_GRPC_BACKEND=backend-assets/grpc/whisper
-          make dist
-      - uses: actions/upload-artifact@v4
-        with:
-          name: LocalAI-MacOS-x86_64
-          path: release/
-      - name: Release
-        uses: softprops/action-gh-release@v2
-        if: startsWith(github.ref, 'refs/tags/')
-        with:
-          files: |
-            release/*
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
-
-  build-macOS-arm64:
-    runs-on: macos-14
-    steps:
-      - name: Clone
-        uses: actions/checkout@v4
-        with:
-          submodules: true
-      - uses: actions/setup-go@v5
-        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
-        run: |
-          brew install protobuf grpc libomp llvm
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-      - name: Build
-        id: build
-        run: |
-          export C_INCLUDE_PATH=/usr/local/include
-          export CPLUS_INCLUDE_PATH=/usr/local/include
-          export PATH=$PATH:$GOPATH/bin
-          export CC=/opt/homebrew/opt/llvm/bin/clang
-          make dist
-      - uses: actions/upload-artifact@v4
-        with:
-          name: LocalAI-MacOS-arm64
-          path: release/
-      - name: Release
-        uses: softprops/action-gh-release@v2
-        if: startsWith(github.ref, 'refs/tags/')
-        with:
-          files: |
-            release/*
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
+          files: ./local-ai-launcher-linux.tar.xz
--- a/.github/workflows/secscan.yaml
+++ b/.github/workflows/secscan.yaml
@@ -14,17 +14,17 @@ jobs:
      GO111MODULE: on
    steps:
      - name: Checkout Source
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        if: ${{ github.actor != 'dependabot[bot]' }}
      - name: Run Gosec Security Scanner
        if: ${{ github.actor != 'dependabot[bot]' }}
-        uses: securego/gosec@v2.22.0
+        uses: securego/gosec@v2.22.9
        with:
          # we let the report trigger content trigger a failure using the GitHub Security features.
          args: '-no-fail -fmt sarif -out results.sarif ./...'
      - name: Upload SARIF file
        if: ${{ github.actor != 'dependabot[bot]' }}
-        uses: github/codeql-action/upload-sarif@v3
+        uses: github/codeql-action/upload-sarif@v4
        with:
          # Path to SARIF file relative to the root of the repository
          sarif_file: results.sarif
--- a/.github/workflows/stalebot.yml
+++ b/.github/workflows/stalebot.yml
@@ -0,0 +1,25 @@
+name: 'Close stale issues and PRs'
+permissions:
+  issues: write
+  pull-requests: write
+on:
+  schedule:
+    - cron: '30 1 * * *'
+
+jobs:
+  stale:
+    if: github.repository == 'mudler/LocalAI'
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v9
+        with:
+          stale-issue-message: 'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
+          stale-pr-message: 'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
+          close-issue-message: 'This issue was closed because it has been stalled for 5 days with no activity.'
+          close-pr-message: 'This PR was closed because it has been stalled for 10 days with no activity.'
+          days-before-issue-stale: 90
+          days-before-pr-stale: 90
+          days-before-issue-close: 5
+          days-before-pr-close: 10
+          exempt-issue-labels: 'roadmap'
+          exempt-pr-labels: 'roadmap'
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -14,11 +14,77 @@ concurrency:
  cancel-in-progress: true

 jobs:
+  detect-changes:
+    runs-on: ubuntu-latest
+    outputs:
+      run-all: ${{ steps.detect.outputs.run-all }}
+      transformers: ${{ steps.detect.outputs.transformers }}
+      rerankers: ${{ steps.detect.outputs.rerankers }}
+      diffusers: ${{ steps.detect.outputs.diffusers }}
+      coqui: ${{ steps.detect.outputs.coqui }}
+      moonshine: ${{ steps.detect.outputs.moonshine }}
+      pocket-tts: ${{ steps.detect.outputs.pocket-tts }}
+      qwen-tts: ${{ steps.detect.outputs.qwen-tts }}
+      qwen-asr: ${{ steps.detect.outputs.qwen-asr }}
+      nemo: ${{ steps.detect.outputs.nemo }}
+      voxcpm: ${{ steps.detect.outputs.voxcpm }}
+      llama-cpp-quantization: ${{ steps.detect.outputs.llama-cpp-quantization }}
+      llama-cpp: ${{ steps.detect.outputs.llama-cpp }}
+      ik-llama-cpp: ${{ steps.detect.outputs.ik-llama-cpp }}
+      turboquant: ${{ steps.detect.outputs.turboquant }}
+      vllm: ${{ steps.detect.outputs.vllm }}
+      sglang: ${{ steps.detect.outputs.sglang }}
+      acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
+      qwen3-tts-cpp: ${{ steps.detect.outputs.qwen3-tts-cpp }}
+      vibevoice-cpp: ${{ steps.detect.outputs.vibevoice-cpp }}
+      voxtral: ${{ steps.detect.outputs.voxtral }}
+      kokoros: ${{ steps.detect.outputs.kokoros }}
+      insightface: ${{ steps.detect.outputs.insightface }}
+      speaker-recognition: ${{ steps.detect.outputs.speaker-recognition }}
+      sherpa-onnx: ${{ steps.detect.outputs.sherpa-onnx }}
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+      - name: Setup Bun
+        uses: oven-sh/setup-bun@v2
+      - name: Install dependencies
+        run: bun add js-yaml @octokit/core
+      - name: Detect changed backends
+        id: detect
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GITHUB_EVENT_PATH: ${{ github.event_path }}
+        run: bun run scripts/changed-backends.js
+
+  # Requires CUDA
+  # tests-chatterbox-tts:
+  #   runs-on: ubuntu-latest
+  #   steps:
+  #     - name: Clone
+  #       uses: actions/checkout@v6
+  #       with:
+  #         submodules: true
+  #     - name: Dependencies
+  #       run: |
+  #         sudo apt-get update
+  #         sudo apt-get install build-essential ffmpeg
+  #         # Install UV
+  #         curl -LsSf https://astral.sh/uv/install.sh | sh
+  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+  #         sudo apt-get install -y libopencv-dev
+  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
+
+  #     - name: Test chatterbox-tts
+  #       run: |
+  #          make --jobs=5 --output-sync=target -C backend/python/chatterbox
+  #          make --jobs=5 --output-sync=target -C backend/python/chatterbox test
  tests-transformers:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.transformers == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -36,10 +102,12 @@ jobs:
           make --jobs=5 --output-sync=target -C backend/python/transformers
           make --jobs=5 --output-sync=target -C backend/python/transformers test
  tests-rerankers:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.rerankers == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -58,10 +126,12 @@ jobs:
           make --jobs=5 --output-sync=target -C backend/python/rerankers test

  tests-diffusers:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.diffusers == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
@@ -78,11 +148,31 @@ jobs:
          make --jobs=5 --output-sync=target -C backend/python/diffusers
          make --jobs=5 --output-sync=target -C backend/python/diffusers test

+  #tests-vllm:
+  #  runs-on: ubuntu-latest
+  #  steps:
+  #    - name: Clone
+  #      uses: actions/checkout@v6
+  #      with:
+  #        submodules: true
+  #    - name: Dependencies
+  #      run: |
+  #        sudo apt-get update
+  #        sudo apt-get install -y build-essential ffmpeg
+  #        sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+  #        sudo apt-get install -y libopencv-dev
+  #        # Install UV
+  #        curl -LsSf https://astral.sh/uv/install.sh | sh
+  #        pip install --user --no-cache-dir grpcio-tools==1.64.1
+  #    - name: Test vllm backend
+  #      run: |
+  #        make --jobs=5 --output-sync=target -C backend/python/vllm
+  #        make --jobs=5 --output-sync=target -C backend/python/vllm test
  # tests-transformers-musicgen:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v4
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -144,7 +234,7 @@ jobs:
  #           sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
  #           df -h
  #     - name: Clone
-  #       uses: actions/checkout@v4
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -169,7 +259,7 @@ jobs:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v4
+  #       uses: actions/checkout@v6
  #       with:
  #         submodules: true
  #     - name: Dependencies
@@ -187,16 +277,18 @@ jobs:
  #          make --jobs=5 --output-sync=target -C backend/python/vllm test

  tests-coqui:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.coqui == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
-          sudo apt-get install build-essential ffmpeg
+          sudo apt-get install -y build-essential ffmpeg
          sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
          # Install UV
          curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -205,3 +297,697 @@ jobs:
        run: |
          make --jobs=5 --output-sync=target -C backend/python/coqui
          make --jobs=5 --output-sync=target -C backend/python/coqui test
+  tests-moonshine:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.moonshine == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential ffmpeg
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test moonshine
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/moonshine
+          make --jobs=5 --output-sync=target -C backend/python/moonshine test
+  tests-pocket-tts:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.pocket-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential ffmpeg
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test pocket-tts
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/pocket-tts
+          make --jobs=5 --output-sync=target -C backend/python/pocket-tts test
+  tests-qwen-tts:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.qwen-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential ffmpeg
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test qwen-tts
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/qwen-tts
+          make --jobs=5 --output-sync=target -C backend/python/qwen-tts test
+  # TODO: s2-pro model is too large to load on CPU-only CI runners — re-enable
+  # when we have GPU runners or a smaller test model.
+  # tests-fish-speech:
+  #   runs-on: ubuntu-latest
+  #   timeout-minutes: 45
+  #   steps:
+  #     - name: Clone
+  #       uses: actions/checkout@v6
+  #       with:
+  #         submodules: true
+  #     - name: Dependencies
+  #       run: |
+  #         sudo apt-get update
+  #         sudo apt-get install -y build-essential ffmpeg portaudio19-dev
+  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+  #         # Install UV
+  #         curl -LsSf https://astral.sh/uv/install.sh | sh
+  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
+  #     - name: Test fish-speech
+  #       run: |
+  #         make --jobs=5 --output-sync=target -C backend/python/fish-speech
+  #         make --jobs=5 --output-sync=target -C backend/python/fish-speech test
+  tests-qwen-asr:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.qwen-asr == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential ffmpeg sox
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test qwen-asr
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/qwen-asr
+          make --jobs=5 --output-sync=target -C backend/python/qwen-asr test
+  tests-nemo:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.nemo == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential ffmpeg sox
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test nemo
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/nemo
+          make --jobs=5 --output-sync=target -C backend/python/nemo test
+  tests-voxcpm:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.voxcpm == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install build-essential ffmpeg
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Test voxcpm
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/voxcpm
+          make --jobs=5 --output-sync=target -C backend/python/voxcpm test
+  tests-llama-cpp-quantization:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.llama-cpp-quantization == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl git python3-pip
+          # Install UV
+          curl -LsSf https://astral.sh/uv/install.sh | sh
+          pip install --user --no-cache-dir grpcio-tools==1.64.1
+      - name: Build llama-quantize from llama.cpp
+        run: |
+          git clone --depth 1 https://github.com/ggml-org/llama.cpp.git /tmp/llama.cpp
+          cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_NATIVE=OFF
+          cmake --build /tmp/llama.cpp/build --target llama-quantize -j$(nproc)
+          sudo cp /tmp/llama.cpp/build/bin/llama-quantize /usr/local/bin/
+      - name: Install backend
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization
+      - name: Test llama-cpp-quantization
+        run: |
+          make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization test
+  tests-llama-cpp-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build llama-cpp backend image and run gRPC e2e tests
+        run: |
+          make test-extra-backend-llama-cpp
+  tests-llama-cpp-grpc-transcription:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build llama-cpp backend image and run audio transcription gRPC e2e tests
+        run: |
+          make test-extra-backend-llama-cpp-transcription
+  # PR-acceptance smoke gate: always runs on every PR (no detect-changes gate, no
+  # paths filter). Pulls the pre-built master CPU llama-cpp image from quay
+  # instead of building from source, so the cost is a docker pull (~30s) plus the
+  # short Qwen3-0.6B model download. Exercises the full gRPC surface — health,
+  # load, predict, stream — plus the logprobs/logit_bias specs that moved out of
+  # core/http/app_test.go. Anything heavier or per-backend is gated to the
+  # detect-changes path-filter above.
+  tests-llama-cpp-smoke:
+    runs-on: ubuntu-latest
+    timeout-minutes: 20
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Pull pre-built llama-cpp backend image
+        run: docker pull quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp
+      - name: Run e2e-backends smoke
+        env:
+          BACKEND_IMAGE: quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp
+          BACKEND_TEST_CAPS: health,load,predict,stream,logprobs,logit_bias
+        run: |
+          make test-extra-backend
+  # Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked LLM.
+  # Builds the sherpa-onnx Docker image, extracts the rootfs so the e2e suite
+  # can discover the backend binary + shared libs, downloads the three model
+  # bundles (silero-vad, omnilingual-asr, vits-ljs) and drives the realtime
+  # websocket spec end-to-end.
+  tests-sherpa-onnx-realtime:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+      - name: Build sherpa-onnx backend image and run realtime e2e tests
+        run: |
+          make test-extra-e2e-realtime-sherpa
+  # Streaming ASR via the sherpa-onnx online recognizer (zipformer
+  # transducer). Exercises both AudioTranscription (buffered) and
+  # AudioTranscriptionStream (real-time deltas) on the e2e-backends
+  # harness.
+  tests-sherpa-onnx-grpc-transcription:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build sherpa-onnx backend image and run streaming ASR gRPC e2e tests
+        run: |
+          make test-extra-backend-sherpa-onnx-transcription
+  # VITS TTS via the sherpa-onnx backend. Drives both TTS (file write) and
+  # TTSStream (PCM chunks) on the e2e-backends harness.
+  tests-sherpa-onnx-grpc-tts:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build sherpa-onnx backend image and run TTS gRPC e2e tests
+        run: |
+          make test-extra-backend-sherpa-onnx-tts
+  tests-ik-llama-cpp-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.ik-llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build ik-llama-cpp backend image and run gRPC e2e tests
+        run: |
+          make test-extra-backend-ik-llama-cpp
+  tests-turboquant-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.turboquant == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      # Exercises the turboquant (llama.cpp fork) backend with KV-cache
+      # quantization enabled. The convenience target sets
+      # BACKEND_TEST_CACHE_TYPE_K / _V=q8_0, which are plumbed into the
+      # ModelOptions.CacheTypeKey/Value gRPC fields. LoadModel-success +
+      # backend stdout/stderr (captured by the Ginkgo suite) prove the
+      # cache-type config path reaches the fork's KV-cache init.
+      - name: Build turboquant backend image and run gRPC e2e tests
+        run: |
+          make test-extra-backend-turboquant
+  # tests-vllm-grpc is currently disabled in CI.
+  #
+  # The prebuilt vllm CPU wheel is compiled with AVX-512 VNNI/BF16
+  # instructions, and neither ubuntu-latest nor the bigger-runner pool
+  # offers a stable CPU baseline that supports them — runners come
+  # back with different hardware between runs and SIGILL on import of
+  # vllm.model_executor.models.registry. Compiling vllm from source
+  # via FROM_SOURCE=true works on any CPU but takes 30-50 minutes per
+  # run, which is too slow for a smoke test.
+  #
+  # The test itself (tests/e2e-backends + make test-extra-backend-vllm)
+  # is fully working and validated locally on a host with the right
+  # SIMD baseline. Run it manually with:
+  #
+  #   make test-extra-backend-vllm
+  #
+  # Re-enable this job once we have a self-hosted runner label with
+  # guaranteed AVX-512 VNNI/BF16 support, or once the vllm project
+  # publishes a CPU wheel with a wider baseline.
+  #
+  # tests-vllm-grpc:
+  #   needs: detect-changes
+  #   if: needs.detect-changes.outputs.vllm == 'true' || needs.detect-changes.outputs.run-all == 'true'
+  #   runs-on: bigger-runner
+  #   timeout-minutes: 90
+  #   steps:
+  #     - name: Clone
+  #       uses: actions/checkout@v6
+  #       with:
+  #         submodules: true
+  #     - name: Dependencies
+  #       run: |
+  #         sudo apt-get update
+  #         sudo apt-get install -y --no-install-recommends \
+  #             make build-essential curl unzip ca-certificates git tar
+  #     - name: Setup Go
+  #       uses: actions/setup-go@v5
+  #       with:
+  #         go-version: '1.25.4'
+  #     - name: Free disk space
+  #       run: |
+  #         sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+  #         df -h
+  #     - name: Build vllm (cpu) backend image and run gRPC e2e tests
+  #       run: |
+  #         make test-extra-backend-vllm
+  # tests-sglang-grpc is currently disabled in CI for the same reason as
+  # tests-vllm-grpc: sglang's CPU kernel (sgl-kernel) uses __m512 AVX-512
+  # intrinsics unconditionally in shm.cpp, so the from-source build
+  # requires `-march=sapphirerapids` (already set in install.sh) and the
+  # resulting binary SIGILLs at import on CPUs without AVX-512 VNNI/BF16.
+  # The ubuntu-latest runner pool does not guarantee that ISA baseline.
+  #
+  # The test itself (tests/e2e-backends + make test-extra-backend-sglang)
+  # is fully working and validated locally on a host with the right
+  # SIMD baseline. Run it manually with:
+  #
+  #   make test-extra-backend-sglang
+  #
+  # Re-enable this job once we have a self-hosted runner label with
+  # guaranteed AVX-512 VNNI/BF16 support.
+  #
+  # tests-sglang-grpc:
+  #   needs: detect-changes
+  #   if: needs.detect-changes.outputs.sglang == 'true' || needs.detect-changes.outputs.run-all == 'true'
+  #   runs-on: bigger-runner
+  #   timeout-minutes: 90
+  #   steps:
+  #     - name: Clone
+  #       uses: actions/checkout@v6
+  #       with:
+  #         submodules: true
+  #     - name: Dependencies
+  #       run: |
+  #         sudo apt-get update
+  #         sudo apt-get install -y --no-install-recommends \
+  #             make build-essential curl unzip ca-certificates git tar
+  #     - name: Setup Go
+  #       uses: actions/setup-go@v5
+  #       with:
+  #         go-version: '1.25.4'
+  #     - name: Free disk space
+  #       run: |
+  #         sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+  #         df -h
+  #     - name: Build sglang (cpu) backend image and run gRPC e2e tests
+  #       run: |
+  #         make test-extra-backend-sglang
+  tests-acestep-cpp:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.acestep-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
+      - name: Setup Go
+        uses: actions/setup-go@v5
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Build acestep-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/acestep-cpp
+      - name: Test acestep-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/acestep-cpp test
+  tests-qwen3-tts-cpp:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.qwen3-tts-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
+      - name: Setup Go
+        uses: actions/setup-go@v5
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Build qwen3-tts-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/qwen3-tts-cpp
+      - name: Test qwen3-tts-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/qwen3-tts-cpp test
+  # Per-backend smoke for vibevoice-cpp: builds the .so + Go binary and
+  # runs `make -C backend/go/vibevoice-cpp test`. test.sh auto-downloads
+  # the published mudler/vibevoice.cpp-models bundle (TTS Q8_0 + ASR Q4_K
+  # + tokenizer + voice) and runs the closed-loop TTS → ASR Go test.
+  tests-vibevoice-cpp:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
+      - name: Setup Go
+        uses: actions/setup-go@v5
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Build vibevoice-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/vibevoice-cpp
+      - name: Test vibevoice-cpp
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/vibevoice-cpp test
+  # End-to-end TTS via the e2e-backends gRPC harness. Builds the
+  # vibevoice-cpp Docker image and drives Backend/TTS against it with a
+  # real LocalAI gRPC client.
+  tests-vibevoice-cpp-grpc-tts:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Build vibevoice-cpp backend image and run TTS gRPC e2e tests
+        run: |
+          make test-extra-backend-vibevoice-cpp-tts
+  # End-to-end transcription via the e2e-backends gRPC harness. The
+  # vibevoice ASR is a 7B-param model (Q4_K weights ~10 GB on disk)
+  # and the JFK 30 s decode is too heavy for a free 4-core
+  # ubuntu-latest pool runner - two CI attempts got SIGTERM'd during
+  # LoadModel, before the test could even progress. Use the
+  # self-hosted 'bigger-runner' label (same one the GPU image builds
+  # in backend.yml use) and the documented dotnet/ghc/android cache
+  # purge to clear ~10-20 GB of headroom for the model + Docker
+  # image + working dir.
+  tests-vibevoice-cpp-grpc-transcription:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: bigger-runner
+    timeout-minutes: 150
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+              make build-essential curl unzip ca-certificates git tar
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.25.4'
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+          df -h
+      - name: Build vibevoice-cpp backend image and run ASR gRPC e2e tests
+        run: |
+          make test-extra-backend-vibevoice-cpp-transcription
+  tests-voxtral:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.voxtral == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
+      - name: Setup Go
+        uses: actions/setup-go@v5
+      # You can test your matrix by printing the current Go version
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Build voxtral
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/voxtral
+      - name: Test voxtral
+        run: |
+          make --jobs=5 --output-sync=target -C backend/go/voxtral test
+  tests-kokoros:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.kokoros == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential cmake pkg-config protobuf-compiler clang libclang-dev
+          sudo apt-get install -y espeak-ng libespeak-ng-dev libsonic-dev libpcaudio-dev libopus-dev libssl-dev
+          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+          echo "$HOME/.cargo/bin" >> $GITHUB_PATH
+      - name: Build kokoros
+        run: |
+          make -C backend/rust/kokoros kokoros-grpc
+      - name: Test kokoros
+        run: |
+          make -C backend/rust/kokoros test
+  tests-insightface-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.insightface == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+              make build-essential curl unzip ca-certificates git tar
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.26.0'
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+          df -h
+      - name: Build insightface backend image and run both model configurations
+        run: |
+          make test-extra-backend-insightface-all
+  tests-speaker-recognition-grpc:
+    needs: detect-changes
+    if: needs.detect-changes.outputs.speaker-recognition == 'true' || needs.detect-changes.outputs.run-all == 'true'
+    runs-on: ubuntu-latest
+    timeout-minutes: 90
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y --no-install-recommends \
+              make build-essential curl ca-certificates git tar
+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: '1.26.0'
+      - name: Free disk space
+        run: |
+          sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/lib/android /opt/hostedtoolcache/CodeQL || true
+          df -h
+      - name: Build speaker-recognition backend image and run the ECAPA-TDNN configuration
+        run: |
+          make test-extra-backend-speaker-recognition-all
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -3,15 +3,18 @@ name: 'tests'

 on:
  pull_request:
+    paths-ignore:
+      - 'docs/**'
+      - 'examples/**'
+      - 'README.md'
+      - '**/*.md'
+      - 'backend/**'
  push:
    branches:
      - master
    tags:
      - '*'

-env:
-  GRPC_VERSION: v1.65.0
-
 concurrency:
  group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
  cancel-in-progress: true
@@ -21,8 +24,22 @@ jobs:
    runs-on: ubuntu-latest
    strategy:
      matrix:
-        go-version: ['1.21.x']
+        go-version: ['1.26.x']
    steps:
+      - name: Free Disk Space (Ubuntu)
+        uses: jlumbroso/free-disk-space@main
+        with:
+          # this might remove tools that are actually needed,
+          # if set to "true" but frees about 6 GB
+          tool-cache: true
+          # all of these default to true, but feel free to set to
+          # "false" if necessary for your workflow
+          android: true
+          dotnet: true
+          haskell: true
+          large-packages: true
+          docker-images: true
+          swap-storage: true
      - name: Release space from worker
        run: |
          echo "Listing top largest packages"
@@ -56,7 +73,7 @@ jobs:
          sudo rm -rfv build || true
          df -h
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go ${{ matrix.go-version }}
@@ -67,115 +84,7 @@ jobs:
      # You can test your matrix by printing the current Go version
      - name: Display Go version
        run: go version
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install build-essential ccache upx-ucl curl ffmpeg
-          sudo apt-get install -y libgmock-dev
-          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
-             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
-             gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
-             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
-             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
-             sudo apt-get update && \
-             sudo apt-get install -y conda
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          sudo apt-get install -y ca-certificates cmake patch python3-pip unzip
-          sudo apt-get install -y libopencv-dev
-
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-
-          curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
-          sudo dpkg -i cuda-keyring_1.1-1_all.deb
-          sudo apt-get update
-          sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
-          export CUDACXX=/usr/local/cuda/bin/nvcc
-
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-
-          # The python3-grpc-tools package in 22.04 is too old
-          pip install --user grpcio-tools
-
-          make -C backend/python/transformers
-
-          # Pre-build piper before we start tests in order to have shared libraries in place
-          make sources/go-piper && \
-          GO_TAGS="tts" make -C sources/go-piper piper.o && \
-          sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/
-        env:
-          CUDA_VERSION: 12-4
-      - name: Cache grpc
-        id: cache-grpc
-        uses: actions/cache@v4
-        with:
-          path: grpc
-          key: ${{ runner.os }}-grpc-${{ env.GRPC_VERSION }}
-      - name: Build grpc
-        if: steps.cache-grpc.outputs.cache-hit != 'true'
-        run: |
-          git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --jobs 5 --shallow-submodules https://github.com/grpc/grpc && \
-          cd grpc && sed -i "216i\  TESTONLY" "third_party/abseil-cpp/absl/container/CMakeLists.txt" && mkdir -p cmake/build && cd cmake/build && \
-          cmake -DgRPC_INSTALL=ON \
-            -DgRPC_BUILD_TESTS=OFF \
-            ../.. && sudo make --jobs 5
-      - name: Install gRPC
-        run: |
-          cd grpc && cd cmake/build && sudo make --jobs 5 install
-      - name: Test
-        run: |
-          PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
-
-  tests-aio-container:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Release space from worker
-        run: |
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          df -h
-          echo
-          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
-          sudo rm -rf /usr/local/lib/android
-          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
-          sudo rm -rf /usr/share/dotnet
-          sudo apt-get remove -y '^mono-.*' || true
-          sudo apt-get remove -y '^ghc-.*' || true
-          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
-          sudo apt-get remove -y 'php.*' || true
-          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
-          sudo apt-get remove -y '^google-.*' || true
-          sudo apt-get remove -y azure-cli || true
-          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
-          sudo apt-get remove -y '^gfortran-.*' || true
-          sudo apt-get autoremove -y
-          sudo apt-get clean
-          echo
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          sudo rm -rfv build || true
-          df -h
-      - name: Clone
-        uses: actions/checkout@v4
-        with:
-          submodules: true
-      - name: Dependencies
+      - name: Proto Dependencies
        run: |
          # Install protoc
          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
@@ -184,30 +93,35 @@ jobs:
          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - name: Build images
+      - name: Dependencies
        run: |
-          docker build --build-arg FFMPEG=true --build-arg IMAGE_TYPE=extras --build-arg EXTRA_BACKENDS=rerankers --build-arg MAKEFLAGS="--jobs=5 --output-sync=target" -t local-ai:tests -f Dockerfile .
-          BASE_IMAGE=local-ai:tests DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio
+          sudo apt-get update
+          sudo apt-get install curl ffmpeg libopus-dev
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+      - name: Build React UI
+        run: make react-ui
      - name: Test
        run: |
-            PATH="$PATH:$HOME/go/bin" LOCALAI_MODELS_DIR=$PWD/models LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio \
-            make run-e2e-aio
+          PATH="$PATH:/root/go/bin" make --jobs 5 --output-sync=target test
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
+        uses: mxschmitt/action-tmate@v3.23
        with:
          detached: true
          connect-timeout-seconds: 180
          limit-access-to-actor: true

  tests-apple:
-    runs-on: macOS-14
+    runs-on: macos-latest
    strategy:
      matrix:
-        go-version: ['1.21.x']
+        go-version: ['1.26.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          submodules: true
      - name: Setup Go ${{ matrix.go-version }}
@@ -220,8 +134,14 @@ jobs:
        run: go version
      - name: Dependencies
        run: |
-          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
-          pip install --user --no-cache-dir grpcio-tools
+          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus ffmpeg
+          pip install --user --no-cache-dir grpcio-tools grpcio
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+      - name: Build React UI
+        run: make react-ui
      - name: Test
        run: |
          export C_INCLUDE_PATH=/usr/local/include
@@ -229,10 +149,11 @@ jobs:
          export CC=/opt/homebrew/opt/llvm/bin/clang
          # Used to run the newer GNUMake version from brew that supports --output-sync
          export PATH="/opt/homebrew/opt/make/libexec/gnubin:$PATH"
-          BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+          PATH="$PATH:$HOME/go/bin" BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.19
+        uses: mxschmitt/action-tmate@v3.23
        with:
          detached: true
          connect-timeout-seconds: 180
--- a/.github/workflows/tests-aio.yml
+++ b/.github/workflows/tests-aio.yml
@@ -0,0 +1,86 @@
+---
+name: 'tests-aio'
+
+# Runs the all-in-one (AIO) Docker image with real backends + real models.
+# Heavy: builds llama-cpp/whisper/piper/silero-vad/stablediffusion-ggml/local-store
+# and exercises end-to-end inference inside the container. Moved out of test.yml
+# (which used to run on every PR) so PR CI no longer pays this cost.
+#
+# Triggers:
+#   - schedule (nightly @ 04:00 UTC) — catches packaging/image regressions within 24h
+#   - workflow_dispatch — manual run on-demand
+#   - push to master/tags — sanity check after merge / before release
+
+on:
+  schedule:
+    - cron: '0 4 * * *'
+  workflow_dispatch:
+  push:
+    branches:
+      - master
+    tags:
+      - '*'
+
+concurrency:
+  group: ci-tests-aio-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  tests-aio:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Release space from worker
+        run: |
+          echo "Listing top largest packages"
+          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+          head -n 30 <<< "${pkgs}"
+          echo
+          df -h
+          echo
+          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
+          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
+          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
+          sudo rm -rf /usr/local/lib/android
+          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
+          sudo rm -rf /usr/share/dotnet
+          sudo apt-get remove -y '^mono-.*' || true
+          sudo apt-get remove -y '^ghc-.*' || true
+          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
+          sudo apt-get remove -y 'php.*' || true
+          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
+          sudo apt-get remove -y '^google-.*' || true
+          sudo apt-get remove -y azure-cli || true
+          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
+          sudo apt-get remove -y '^gfortran-.*' || true
+          sudo apt-get autoremove -y
+          sudo apt-get clean
+          echo
+          echo "Listing top largest packages"
+          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
+          head -n 30 <<< "${pkgs}"
+          echo
+          sudo rm -rfv build || true
+          df -h
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Test
+        run: |
+            PATH="$PATH:$HOME/go/bin" make backends/local-store backends/silero-vad backends/llama-cpp backends/whisper backends/piper backends/stablediffusion-ggml docker-build-e2e e2e-aio
+      - name: Setup tmate session if tests fail
+        if: ${{ failure() }}
+        uses: mxschmitt/action-tmate@v3.23
+        with:
+          detached: true
+          connect-timeout-seconds: 180
+          limit-access-to-actor: true
--- a/.github/workflows/tests-e2e.yml
+++ b/.github/workflows/tests-e2e.yml
@@ -0,0 +1,70 @@
+---
+name: 'E2E Backend Tests'
+
+on:
+  pull_request:
+    paths-ignore:
+      - 'docs/**'
+      - 'examples/**'
+      - 'README.md'
+      - '**/*.md'
+      - 'backend/**'
+  push:
+    branches:
+      - master
+    tags:
+      - '*'
+
+concurrency:
+  group: ci-tests-e2e-backend-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  tests-e2e-backend:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        go-version: ['1.25.x']
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
+      - name: Setup Go ${{ matrix.go-version }}
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ matrix.go-version }}
+          cache: false
+      - name: Display Go version
+        run: go version
+      - name: Proto Dependencies
+        run: |
+          # Install protoc
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+          PATH="$PATH:$HOME/go/bin" make protogen-go
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential libopus-dev
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+      - name: Build React UI
+        run: make react-ui
+      - name: Test Backend E2E
+        run: |
+          PATH="$PATH:$HOME/go/bin" make build-mock-backend test-e2e
+      - name: Setup tmate session if tests fail
+        if: ${{ failure() }}
+        uses: mxschmitt/action-tmate@v3.23
+        with:
+          detached: true
+          connect-timeout-seconds: 180
+          limit-access-to-actor: true
--- a/.github/workflows/tests-ui-e2e.yml
+++ b/.github/workflows/tests-ui-e2e.yml
@@ -0,0 +1,74 @@
+---
+name: 'UI E2E Tests'
+
+on:
+  pull_request:
+    paths:
+      - 'core/http/**'
+      - 'tests/e2e-ui/**'
+      - 'tests/e2e/mock-backend/**'
+  push:
+    branches:
+      - master
+
+concurrency:
+  group: ci-tests-ui-e2e-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  tests-ui-e2e:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        go-version: ['1.26.x']
+    steps:
+      - name: Clone
+        uses: actions/checkout@v6
+        with:
+          submodules: true
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
+      - name: Setup Go ${{ matrix.go-version }}
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ matrix.go-version }}
+          cache: false
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '22'
+      - name: Proto Dependencies
+        run: |
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+      - name: System Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential libopus-dev
+      - name: Build UI test server
+        run: PATH="$PATH:$HOME/go/bin" make build-ui-test-server
+      - name: Install Playwright
+        working-directory: core/http/react-ui
+        run: |
+          npm install
+          npx playwright install --with-deps chromium
+      - name: Run Playwright tests
+        working-directory: core/http/react-ui
+        run: npx playwright test
+      - name: Upload Playwright report
+        if: ${{ failure() }}
+        uses: actions/upload-artifact@v7
+        with:
+          name: playwright-report
+          path: core/http/react-ui/playwright-report/
+          retention-days: 7
+      - name: Setup tmate session if tests fail
+        if: ${{ failure() }}
+        uses: mxschmitt/action-tmate@v3.23
+        with:
+          detached: true
+          connect-timeout-seconds: 180
+          limit-access-to-actor: true
--- a/.github/workflows/update_swagger.yaml
+++ b/.github/workflows/update_swagger.yaml
@@ -5,11 +5,14 @@ on:
  workflow_dispatch:
 jobs:
  swagger:
+    if: github.repository == 'mudler/LocalAI'
    strategy:
      fail-fast: false
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
+      - name: Configure apt mirror on runner
+        uses: ./.github/actions/configure-apt-mirror
      - uses: actions/setup-go@v5
        with:
          go-version: 'stable'
@@ -25,7 +28,7 @@ jobs:
        run: |
          make protogen-go swagger
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v7
+        uses: peter-evans/create-pull-request@v8
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
--- a/.github/workflows/yaml-check.yml
+++ b/.github/workflows/yaml-check.yml
@@ -8,7 +8,7 @@ jobs:
    steps:
      - name: 'Checkout'
        uses: actions/checkout@master
-      - name: 'Yamllint'
+      - name: 'Yamllint model gallery'
        uses: karancode/yamllint-github-action@master
        with:
          yamllint_file_or_dir: 'gallery'
@@ -16,3 +16,11 @@ jobs:
          yamllint_comment: true
        env:
          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      - name: 'Yamllint Backend gallery'
+        uses: karancode/yamllint-github-action@master
+        with:
+          yamllint_file_or_dir: 'backend'
+          yamllint_strict: false
+          yamllint_comment: true
+        env:
+          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -5,9 +5,14 @@ __pycache__/
 *.o
 get-sources
 prepare-sources
-/backend/cpp/llama/grpc-server
-/backend/cpp/llama/llama.cpp
+/backend/cpp/llama-cpp/grpc-server
+/backend/cpp/llama-cpp/llama.cpp
 /backend/cpp/llama-*
+!backend/cpp/llama-cpp
+/backends
+/backend-images
+/result.yaml
+protoc

 *.log

@@ -19,7 +24,8 @@ go-bert

 # LocalAI build binary
 LocalAI
-local-ai
+/local-ai
+/local-ai-launcher
 # prevent above rules from omitting the helm chart
 !charts/*
 # prevent above rules from omitting the api/localai folder
@@ -30,6 +36,8 @@ local-ai
 models/*
 test-models/
 test-dir/
+tests/e2e-aio/backends
+mock-backend

 release/

@@ -56,4 +64,16 @@ docs/static/gallery.html
 **/venv

 # per-developer customization files for the development container
-.devcontainer/customization/*
+.devcontainer/customization/*
+
+# React UI build artifacts (keep placeholder dist/index.html)
+core/http/react-ui/node_modules/
+core/http/react-ui/dist
+
+# Extracted backend binaries for container-based testing
+local-backends/
+
+# UI E2E test artifacts
+tests/e2e-ui/ui-test-server
+core/http/react-ui/playwright-report/
+core/http/react-ui/test-results/
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "docs/themes/hugo-theme-relearn"]
 	path = docs/themes/hugo-theme-relearn
 	url = https://github.com/McShelby/hugo-theme-relearn.git
-[submodule "docs/themes/lotusdocs"]
-	path = docs/themes/lotusdocs
-	url = https://github.com/colinwilson/lotusdocs
+[submodule "backend/rust/kokoros/sources/Kokoros"]
+	path = backend/rust/kokoros/sources/Kokoros
+	url = https://github.com/lucasjinreal/Kokoros
--- a/.golangci.yml
+++ b/.golangci.yml
@@ -0,0 +1,53 @@
+version: "2"
+
+# Only issues introduced relative to master are reported. Pre-existing issues
+# in the codebase do not fail the lint job; they're treated as a baseline that
+# can be cleaned up incrementally. New code (added lines on a branch) is held
+# to the full linter set. Locally, `make lint-all` overrides this and reports
+# every issue.
+issues:
+  # origin/master because in shallow CI checkouts only the remote-tracking
+  # branch exists; a bare 'master' ref isn't reachable locally.
+  new-from-merge-base: origin/master
+
+linters:
+  default: standard
+  # staticcheck is noisy on this codebase (mostly QF style suggestions like
+  # "could use tagged switch" or "unnecessary fmt.Sprintf"). Re-enable
+  # selectively if a high-signal subset is identified.
+  disable:
+    - staticcheck
+  enable:
+    - forbidigo
+  settings:
+    forbidigo:
+      forbid:
+        - pattern: '^t\.Errorf$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(...) instead of t.Errorf. See .agents/coding-style.md.'
+        - pattern: '^t\.Error$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(...) instead of t.Error. See .agents/coding-style.md.'
+        - pattern: '^t\.Fatalf$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(Succeed()) / Fail(...) instead of t.Fatalf. See .agents/coding-style.md.'
+        - pattern: '^t\.Fatal$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Expect(...).To(Succeed()) / Fail(...) instead of t.Fatal. See .agents/coding-style.md.'
+        - pattern: '^t\.Run$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Describe/Context/It instead of t.Run. See .agents/coding-style.md.'
+        - pattern: '^t\.Skip$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.Skip. See .agents/coding-style.md.'
+        - pattern: '^t\.Skipf$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.Skipf. See .agents/coding-style.md.'
+        - pattern: '^t\.SkipNow$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Skip(...) instead of t.SkipNow. See .agents/coding-style.md.'
+        - pattern: '^t\.Logf$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use GinkgoWriter / fmt.Fprintf(GinkgoWriter, ...) instead of t.Logf. See .agents/coding-style.md.'
+        - pattern: '^t\.Log$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use GinkgoWriter / fmt.Fprintln(GinkgoWriter, ...) instead of t.Log. See .agents/coding-style.md.'
+        - pattern: '^t\.Fail$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.Fail. See .agents/coding-style.md.'
+        - pattern: '^t\.FailNow$'
+          msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.FailNow. See .agents/coding-style.md.'
+  exclusions:
+    paths:
+      # Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
+      - 'backend/go/whisper/sources'
+      - 'docs/'
--- a/.goreleaser.yaml
+++ b/.goreleaser.yaml
@@ -0,0 +1,37 @@
+version: 2
+before:
+  hooks:
+    - make protogen-go
+    - make react-ui
+    - go mod tidy
+dist: release
+source:
+  enabled: true
+  name_template: '{{ .ProjectName }}-{{ .Tag }}-source'
+builds:
+  - main: ./cmd/local-ai
+    env:
+      - CGO_ENABLED=0
+    ldflags:
+      - -s -w
+      - -X "github.com/mudler/LocalAI/internal.Version={{ .Tag }}"
+      - -X "github.com/mudler/LocalAI/internal.Commit={{ .FullCommit }}"
+    goos:
+      - linux
+      - darwin
+      #- windows
+    goarch:
+      - amd64
+      - arm64
+    ignore:
+      - goos: darwin
+        goarch: amd64
+archives:
+  - formats: [ 'binary' ] # this removes the tar of the archives, leaving the binaries alone
+    name_template: local-ai-{{ .Tag }}-{{ .Os }}-{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}
+checksum:
+  name_template: '{{ .ProjectName }}-{{ .Tag }}-checksums.txt'
+snapshot:
+  version_template: "{{ .Tag }}-next"
+changelog:
+  use: github-native
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@@ -26,7 +26,7 @@
                "LOCALAI_P2P": "true",
                "LOCALAI_FEDERATED": "true"
            },
-            "buildFlags": ["-tags", "p2p tts", "-v"],
+            "buildFlags": ["-tags", "", "-v"],
            "envFile": "${workspaceFolder}/.env",
            "cwd": "${workspaceRoot}"
        }
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,42 @@
+# LocalAI Agent Instructions
+
+This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
+
+Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
+
+## Policy for AI-Assisted Contributions
+
+LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
+
+- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
+- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
+- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
+- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
+
+## Topics
+
+| File | When to read |
+|------|-------------|
+| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
+| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
+| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
+| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
+| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
+| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
+| [.agents/vllm-backend.md](.agents/vllm-backend.md) | Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks |
+| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
+| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
+| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
+| [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
+| [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |
+
+## Quick Reference
+
+- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
+- **Go style**: Prefer `any` over `interface{}`
+- **Comments**: Explain *why*, not *what*
+- **Docs**: Update `docs/content/` when adding features or changing config
+- **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists.
+- **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map.
+- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
+- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -7,10 +7,13 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
 - [Getting Started](#getting-started)
  - [Prerequisites](#prerequisites)
  - [Setting up the Development Environment](#setting-up-the-development-environment)
+  - [Environment Variables](#environment-variables)
 - [Contributing](#contributing)
  - [Submitting an Issue](#submitting-an-issue)
+  - [Development Workflow](#development-workflow)
  - [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
 - [Coding Guidelines](#coding-guidelines)
+- [AI Coding Assistants](#ai-coding-assistants)
 - [Testing](#testing)
 - [Documentation](#documentation)
 - [Community and Communication](#community-and-communication)
@@ -19,17 +22,122 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time

 ### Prerequisites

- Golang [1.21]
- Git
- macOS/Linux
+- **Go 1.21+** (the project currently uses Go 1.26 in `go.mod`, but 1.21 is the minimum supported version)
+  - [Download Go](https://go.dev/dl/) or install via your package manager
+  - macOS: `brew install go`
+  - Ubuntu/Debian: follow the [official instructions](https://go.dev/doc/install) (the `apt` version is often outdated)
+  - Verify: `go version`
+- **Git**
+- **GNU Make**
+- **GCC / C/C++ toolchain** (required for CGo and native backends)
+- **Protocol Buffers compiler** (`protoc`) — needed for gRPC code generation

-### Setting up the Development Environment and running localAI in the local environment
+#### System dependencies by platform

-1. Clone the repository: `git clone https://github.com/go-skynet/LocalAI.git`
-2. Navigate to the project directory: `cd LocalAI`
-3. Install the required dependencies ( see https://localai.io/basics/build/#build-localai-locally )
-4. Build LocalAI: `make build`
-5. Run LocalAI: `./local-ai`
+<details>
+<summary><strong>Ubuntu / Debian</strong></summary>
+
+```bash
+sudo apt-get update
+sudo apt-get install -y build-essential gcc g++ cmake git wget \
+  protobuf-compiler libprotobuf-dev pkg-config \
+  libopencv-dev libgrpc-dev
+```
+
+</details>
+
+<details>
+<summary><strong>CentOS / RHEL / Fedora</strong></summary>
+
+```bash
+sudo dnf groupinstall -y "Development Tools"
+sudo dnf install -y cmake git wget protobuf-compiler protobuf-devel \
+  opencv-devel grpc-devel
+```
+
+</details>
+
+<details>
+<summary><strong>macOS</strong></summary>
+
+```bash
+xcode-select --install
+brew install cmake git protobuf grpc opencv wget
+```
+
+</details>
+
+<details>
+<summary><strong>Windows</strong></summary>
+
+Use [WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) with an Ubuntu distribution, then follow the Ubuntu instructions above.
+
+</details>
+
+### Setting up the Development Environment
+
+1. **Clone the repository:**
+
+   ```bash
+   git clone https://github.com/mudler/LocalAI.git
+   cd LocalAI
+   ```
+
+2. **Build LocalAI:**
+
+   ```bash
+   make build
+   ```
+
+   This runs protobuf generation, installs Go tools, builds the React UI, and compiles the `local-ai` binary. Key build variables you can set:
+
+   | Variable | Description | Example |
+   |---|---|---|
+   | `BUILD_TYPE` | GPU/accelerator type (`cublas`, `hipblas`, `intel`, ``) | `BUILD_TYPE=cublas make build` |
+   | `GO_TAGS` | Additional Go build tags | `GO_TAGS=debug make build` |
+   | `CUDA_MAJOR_VERSION` | CUDA major version (default: `13`) | `CUDA_MAJOR_VERSION=12` |
+
+3. **Run LocalAI:**
+
+   ```bash
+   ./local-ai
+   ```
+
+4. **Development mode with live reload:**
+
+   ```bash
+   make build-dev
+   ```
+
+   This installs [`air`](https://github.com/air-verse/air) automatically and watches for file changes, rebuilding and restarting the server on each save.
+
+5. **Containerized build** (no local toolchain needed):
+
+   ```bash
+   make docker
+   ```
+
+   For GPU-specific Docker builds, see the `docker-build-*` targets in the Makefile and refer to [CLAUDE.md](CLAUDE.md) for detailed backend build instructions.
+
+### Environment Variables
+
+LocalAI is configured primarily through environment variables (or equivalent CLI flags). The most useful ones for development are:
+
+| Variable | Description | Default |
+|---|---|---|
+| `LOCALAI_DEBUG` | Enable debug mode | `false` |
+| `LOCALAI_LOG_LEVEL` | Log verbosity (`error`, `warn`, `info`, `debug`, `trace`) | — |
+| `LOCALAI_LOG_FORMAT` | Log format (`default`, `text`, `json`) | `default` |
+| `LOCALAI_MODELS_PATH` | Path to model files | `./models` |
+| `LOCALAI_BACKENDS_PATH` | Path to backend binaries | `./backends` |
+| `LOCALAI_CONFIG_DIR` | Directory for dynamic config files (API keys, external backends) | `./configuration` |
+| `LOCALAI_THREADS` | Number of threads for inference | — |
+| `LOCALAI_ADDRESS` | Bind address for the API server | `:8080` |
+| `LOCALAI_API_KEY` | API key(s) for authentication | — |
+| `LOCALAI_CORS` | Enable CORS | `false` |
+| `LOCALAI_DISABLE_WEBUI` | Disable the web UI | `false` |
+
+See `core/cli/run.go` for the full list of supported environment variables.

 ## Contributing

@@ -39,44 +147,163 @@ We welcome contributions from everyone! To get started, follow these steps:

 If you find a bug, have a feature request, or encounter any issues, please check the [issue tracker](https://github.com/go-skynet/LocalAI/issues) to see if a similar issue has already been reported. If not, feel free to [create a new issue](https://github.com/go-skynet/LocalAI/issues/new) and provide as much detail as possible.

-### Creating a Pull Request (PR)
+### Development Workflow
+
+#### Branch naming conventions
+
+Use a descriptive branch name that indicates the type and scope of the change:
+
+- `feature/<short-description>` — new functionality
+- `fix/<short-description>` — bug fixes
+- `docs/<short-description>` — documentation changes
+- `refactor/<short-description>` — code refactoring
+
+#### Commit messages
+
+- Use a short, imperative subject line (e.g., "feat: add whisper backend support", not "Added whisper backend support")
+- Keep the subject under 72 characters
+- Use the body to explain **why** the change was made when the subject alone is not sufficient
+- Use [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/)
+
+#### Creating a Pull Request (PR)
+
+Before jumping into a PR for a massive feature or big change, it is preferred to discuss it first via an issue.

 1. Fork the repository.
-2. Create a new branch with a descriptive name: `git checkout -b [branch name]`
-3. Make your changes and commit them.
-4. Push the changes to your fork: `git push origin [branch name]`
-5. Create a new pull request from your branch to the main project's `main` or `master` branch.
-6. Provide a clear description of your changes in the pull request.
-7. Make any requested changes during the review process.
-8. Once your PR is approved, it will be merged into the main project.
+2. Create a new branch: `git checkout -b feature/my-change`
+3. Make your changes, keeping commits focused and atomic.
+4. Run tests locally before pushing (see [Testing](#testing) below).
+5. Push to your fork: `git push origin feature/my-change`
+6. Open a pull request against the `master` branch.
+7. Fill in the PR description with:
+   - What the change does and why
+   - How it was tested
+   - Any breaking changes or migration steps
+8. Respond to review feedback promptly. Push follow-up commits rather than force-pushing amended commits so reviewers can see incremental changes.
+9. Once approved, a maintainer will merge your PR.

 ## Coding Guidelines

- No specific coding guidelines at the moment. Please make sure the code can be tested. The most popular lint tools like [`golangci-lint`](https://golangci-lint.run) can help you here.
+This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
+
+For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
+
+### General Principles
+
+- Write code that can be tested. All new features and bug fixes should include test coverage.
+- Use comments sparingly to explain **why** code does something, not **what** it does. Comments should add context that would be difficult to deduce from reading the code alone.
+- Keep changes focused. Avoid unrelated refactors, formatting changes, or feature additions in the same PR.
+
+### Go Code
+
+- Prefer modern Go idioms — for example, use `any` instead of `interface{}`.
+- Use [`golangci-lint`](https://golangci-lint.run) to catch common issues before submitting a PR.
+- Use [`github.com/mudler/xlog`](https://github.com/mudler/xlog) for logging (same API as `slog`). Do not use `fmt.Println` or the standard `log` package for operational logging.
+- Use tab indentation for Go files (as defined in `.editorconfig`).
+
+### Python Code
+
+- Use 4-space indentation (as defined in `.editorconfig`).
+- Include a `requirements.txt` for any new dependencies.
+
+### Code Review
+
+- All contributions go through code review via pull requests.
+- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
+- Be responsive to review feedback and keep discussions constructive.
+
+## AI Coding Assistants
+
+LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
+
+The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
+
+- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
+- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
+- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
+
+  ```
+  Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
+  ```
+
+  Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
+
+  Basic development tools (git, go, make, editors) should not be listed.
+- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
+- Contributions must remain compatible with LocalAI's **MIT License**.

 ## Testing

-`make test` cannot handle all the model now. Please be sure to add a test case for the new features or the part was changed.
+All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.

-### Running AIO tests
+### Running unit tests

-All-In-One images has a set of tests that automatically verifies that most of the endpoints works correctly, a flow can be :
+```bash
+make test
+```
+
+This downloads test model fixtures, runs protobuf generation, and executes the full test suite including llama-gguf, TTS, and stable-diffusion tests. Note: some tests require model files to be downloaded, so the first run may take longer.
+
+To run tests for a specific package:
+
+```bash
+go test ./core/config/...
+go test ./pkg/model/...
+```
+
+To run a specific test by name using Ginkgo's `--focus` flag:
+
+```bash
+go run github.com/onsi/ginkgo/v2/ginkgo --focus="should load a model" -v -r ./core/
+```
+
+### Running end-to-end tests
+
+The e2e tests run LocalAI in a Docker container and exercise the API:
+
+```bash
+make test-e2e
+```
+
+### Running E2E container tests
+
+These tests build a standard LocalAI Docker image and run it with pre-configured model configs to verify that most endpoints work correctly:

 ```bash
 # Build the LocalAI docker image
-make DOCKER_IMAGE=local-ai docker
+make docker-build-e2e

-# Build the corresponding AIO image
-BASE_IMAGE=local-ai DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio
+# Run the e2e tests (uses model configs from tests/e2e-aio/models/)
+make e2e-aio
+```

-# Run the AIO e2e tests
-LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio make run-e2e-aio
+### Testing backends
+
+To prepare and test extra (Python) backends:
+
+```bash
+make prepare-test-extra   # build Python backends for testing
+make test-extra           # run backend-specific tests
 ```

 ## Documentation

-We are welcome the contribution of the documents, please open new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
- 
+We welcome contributions to the documentation. Please open a new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
+
+### Gallery YAML Schema
+
+LocalAI provides a JSON Schema for gallery model YAML files at:
+
+`core/schema/gallery-model.schema.json`
+
+This schema mirrors the internal gallery model configuration and can be used by editors (such as VS Code) to enable autocomplete, validation, and inline documentation when creating or modifying gallery files.
+
+To use it with the YAML language server, add the following comment at the top of a gallery YAML file:
+
+```yaml
+# yaml-language-server: $schema=../core/schema/gallery-model.schema.json
+```
+
 ## Community and Communication

 - You can reach out via the Github issue tracker.
--- a/542
+++ b/542
@@ -1,36 +1,211 @@
-ARG IMAGE_TYPE=extras
-ARG BASE_IMAGE=ubuntu:22.04
-ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
+ARG BASE_IMAGE=ubuntu:24.04
 ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
+ARG UBUNTU_CODENAME=noble
+# Optional alternate Ubuntu apt mirror(s). Empty = use upstream.
+# See .docker/apt-mirror.sh for accepted values.
+ARG APT_MIRROR=""
+ARG APT_PORTS_MIRROR=""
+
+FROM ${BASE_IMAGE} AS requirements
+
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
+ENV DEBIAN_FRONTEND=noninteractive
+
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+        ca-certificates curl wget espeak-ng libgomp1 \
+        ffmpeg libopenblas0 libopenblas-dev libopus0 sox && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# The requirements-drivers target is for BUILD_TYPE specific items.  If you need to install something specific to CUDA, or specific to ROCM, it goes here.
+FROM requirements AS requirements-drivers
+
+ARG BUILD_TYPE
+ARG CUDA_MAJOR_VERSION=12
+ARG CUDA_MINOR_VERSION=0
+ARG SKIP_DRIVERS=false
+ARG TARGETARCH
+ARG TARGETVARIANT
+ENV BUILD_TYPE=${BUILD_TYPE}
+ARG UBUNTU_VERSION=2404
+
+RUN mkdir -p /run/localai
+RUN echo "default" > /run/localai/capability
+
+# Vulkan requirements
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils wget gpg-agent && \
+        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
+            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
+            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
+            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
+            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
+            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils mesa-vulkan-drivers
+        if [ "amd64" = "$TARGETARCH" ]; then
+            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
+            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
+            mkdir -p /opt/vulkan-sdk && \
+            mv 1.4.335.0 /opt/vulkan-sdk/ && \
+            cd /opt/vulkan-sdk/1.4.335.0 && \
+            ./vulkansdk --no-deps --maxjobs \
+                vulkan-loader \
+                vulkan-validationlayers \
+                vulkan-extensionlayer \
+                vulkan-tools \
+                shaderc && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
+            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
+            rm -rf /opt/vulkan-sdk
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            mkdir vulkan && cd vulkan && \
+            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
+            tar -xvf vulkan-sdk.tar.xz && \
+            rm vulkan-sdk.tar.xz && \
+            cd 1.4.335.0 && \
+            cp -rfv aarch64/bin/* /usr/bin/ && \
+            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
+            cp -rfv aarch64/include/* /usr/include/ && \
+            cp -rfv aarch64/share/* /usr/share/ && \
+            cd ../.. && \
+            rm -rf vulkan
+        fi
+        ldconfig && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        echo "vulkan" > /run/localai/capability
+    fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
+        apt-get update && \
+        apt-get install -y  --no-install-recommends \
+            software-properties-common pciutils
+        if [ "amd64" = "$TARGETARCH" ]; then
+            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
+        fi
+        if [ "arm64" = "$TARGETARCH" ]; then
+            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
+            else
+                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
+            fi
+        fi
+        dpkg -i cuda-keyring_1.1-1_all.deb && \
+        rm -f cuda-keyring_1.1-1_all.deb && \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
+            apt-get install -y --no-install-recommends \
+            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
+        fi
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        echo "nvidia-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
+    fi
+EOT
+
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        echo "nvidia-l4t-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
+    fi
+EOT
+
+# https://github.com/NVIDIA/Isaac-GR00T/issues/343
+RUN <<EOT bash
+    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
+        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
+        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
+        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
+        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
+        apt-get update && apt-get install -y nvpl
+    fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            libclblast-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+        apt-get update && \
+        apt-get install -y --no-install-recommends \
+            hipblas-dev \
+            hipblaslt-dev \
+            rocblas-dev && \
+        apt-get clean && \
+        rm -rf /var/lib/apt/lists/* && \
+        echo "amd" > /run/localai/capability && \
+        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+        ldconfig \
+    ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
+    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
+    ; fi
+
+RUN expr "${BUILD_TYPE}" = intel && echo "intel" > /run/localai/capability || echo "not intel"
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+###################################
+###################################

 # The requirements-core target is common to all images.  It should not be placed in requirements-core unless every single build will use it.
-FROM ${BASE_IMAGE} AS requirements-core
+FROM requirements-drivers AS build-requirements

-USER root
-
-ARG GO_VERSION=1.22.6
-ARG CMAKE_VERSION=3.26.4
+ARG GO_VERSION=1.26.0
+ARG CMAKE_VERSION=3.31.10
 ARG CMAKE_FROM_SOURCE=false
 ARG TARGETARCH
 ARG TARGETVARIANT

-ENV DEBIAN_FRONTEND=noninteractive
-ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,transformers:/build/backend/python/transformers/run.sh,rerankers:/build/backend/python/rerankers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,faster-whisper:/build/backend/python/faster-whisper/run.sh,kokoro:/build/backend/python/kokoro/run.sh,vllm:/build/backend/python/vllm/run.sh,exllama2:/build/backend/python/exllama2/run.sh"
-
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        ccache \
-        ca-certificates \
+        ca-certificates espeak-ng \
        curl libssl-dev \
        git \
-        unzip upx-ucl && \
+        git-lfs \
+        libopus-dev pkg-config \
+        unzip upx-ucl python3 python-is-python3 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

 # Install CMake (the version in 22.04 is too old)
 RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
+    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
    else
        apt-get update && \
@@ -59,124 +234,11 @@ RUN test -n "$TARGETARCH" \
 RUN echo "Target Architecture: $TARGETARCH"
 RUN echo "Target Variant: $TARGETVARIANT"

-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}

-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}

-# OpenBLAS requirements and stable diffusion
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        libopenblas-dev && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*

 WORKDIR /build

-###################################
-###################################
-
-# The requirements-extras target is for any builds with IMAGE_TYPE=extras. It should not be placed in this target unless every IMAGE_TYPE=extras build will use it
-FROM requirements-core AS requirements-extras
-
-# Install uv as a system package
-RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
-ENV PATH="/root/.cargo/bin:${PATH}"
-
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        espeak-ng \
-        espeak \
-        python3-pip \
-        python-is-python3 \
-        python3-dev llvm \
-        python3-venv && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/* && \
-    pip install --upgrade pip
-
-# Install grpcio-tools (the version in 22.04 is too old)
-RUN pip install --user grpcio-tools
-
-###################################
-###################################
-
-# The requirements-drivers target is for BUILD_TYPE specific items.  If you need to install something specific to CUDA, or specific to ROCM, it goes here.
-# This target will be built on top of requirements-core or requirements-extras as retermined by the IMAGE_TYPE build-arg
-FROM requirements-${IMAGE_TYPE} AS requirements-drivers
-
-ARG BUILD_TYPE
-ARG CUDA_MAJOR_VERSION=12
-ARG CUDA_MINOR_VERSION=0
-ARG SKIP_DRIVERS=false
-
-ENV BUILD_TYPE=${BUILD_TYPE}
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
-        wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
-        apt-get update && \
-        apt-get install -y \
-            vulkan-sdk && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi

 ###################################
 ###################################
@@ -185,72 +247,33 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
 # https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/APT-Repository-not-working-signatures-invalid/m-p/1599436/highlight/true#M36143
 # This is a temporary workaround until Intel fixes their repository
 FROM ${INTEL_BASE_IMAGE} AS intel
+ARG UBUNTU_CODENAME=noble
+ARG APT_MIRROR
+ARG APT_PORTS_MIRROR
 RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
-RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
-
-###################################
-###################################
-
-# The grpc target does one thing, it builds and installs GRPC.  This is in it's own layer so that it can be effectively cached by CI.
-# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
-FROM ${GRPC_BASE_IMAGE} AS grpc
-
-# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
-ARG GRPC_MAKEFLAGS="-j4 -Otarget"
-ARG GRPC_VERSION=v1.65.0
-ARG CMAKE_FROM_SOURCE=false
-ARG CMAKE_VERSION=3.26.4
-
-ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
-
-WORKDIR /build
-
-RUN apt-get update && \
+RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${UBUNTU_CODENAME}/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
+RUN --mount=type=bind,source=.docker/apt-mirror.sh,target=/usr/local/sbin/apt-mirror \
+    APT_MIRROR="${APT_MIRROR}" APT_PORTS_MIRROR="${APT_PORTS_MIRROR}" sh /usr/local/sbin/apt-mirror && \
+    apt-get update && \
    apt-get install -y --no-install-recommends \
-        ca-certificates \
-        build-essential curl libssl-dev \
-        git && \
+        intel-oneapi-runtime-libs && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
-
-# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
-# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
-# and running make install in the target container
-RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
-    mkdir -p /build/grpc/cmake/build && \
-    cd /build/grpc/cmake/build && \
-    sed -i "216i\  TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
-    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
-    make && \
-    make install && \
-    rm -rf /build
-
 ###################################
 ###################################

 # The builder-base target has the arguments, variables, and copies shared between full builder images and the uncompiled devcontainer

-FROM requirements-drivers AS builder-base
+FROM build-requirements AS builder-base

-ARG GO_TAGS="tts p2p"
+ARG GO_TAGS="auth"
 ARG GRPC_BACKENDS
 ARG MAKEFLAGS
 ARG LD_FLAGS="-s -w"
-
+ARG TARGETARCH
+ARG TARGETVARIANT
 ENV GRPC_BACKENDS=${GRPC_BACKENDS}
 ENV GO_TAGS=${GO_TAGS}
 ENV MAKEFLAGS=${MAKEFLAGS}
@@ -264,9 +287,7 @@ RUN echo "GO_TAGS: $GO_TAGS" && echo "TARGETARCH: $TARGETARCH"
 WORKDIR /build


-# We need protoc installed, and the version in 22.04 is too old.  We will create one as part installing the GRPC build below
-# but that will also being in a newer version of absl which stablediffusion cannot compile with.  This version of protoc is only
-# here so that we can generate the grpc code for the stablediffusion build
+# We need protoc installed, and the version in 22.04 is too old.
 RUN <<EOT bash
    if [ "amd64" = "$TARGETARCH" ]; then
        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
@@ -283,35 +304,52 @@ EOT
 ###################################
 ###################################

+# Build React UI
+FROM node:25-slim AS react-ui-builder
+WORKDIR /app
+COPY core/http/react-ui/package*.json ./
+RUN npm install
+COPY core/http/react-ui/ ./
+RUN npm run build
+
+###################################
+###################################
+
+# Compile backends first in a separate stage
+FROM builder-base AS builder-backends
+ARG TARGETARCH
+ARG TARGETVARIANT
+
+WORKDIR /build
+
+COPY ./Makefile .
+COPY ./backend ./backend
+COPY ./go.mod .
+COPY ./go.sum .
+COPY ./.git ./.git
+
+# Some of the Go backends use libs from the main src, we could further optimize the caching by building the CPP backends before here
+COPY ./pkg/grpc ./pkg/grpc
+COPY ./pkg/utils ./pkg/utils
+
+RUN ls -l ./
+RUN make protogen-go
+
 # The builder target compiles LocalAI. This target is not the target that will be uploaded to the registry.
 # Adjustments to the build process should likely be made here.
-FROM builder-base AS builder
+FROM builder-backends AS builder

-# Install the pre-built GRPC
-COPY --from=grpc /opt/grpc /usr/local
-
-# Rebuild with defaults backends
 WORKDIR /build

 COPY . .
-COPY .git .

-RUN make prepare
+# Copy pre-built React UI
+COPY --from=react-ui-builder /app/dist ./core/http/react-ui/dist

 ## Build the binary
-## If it's CUDA or hipblas, we want to skip some of the llama-compat backends to save space
-## We only leave the most CPU-optimized variant and the fallback for the cublas/hipblas build
-## (both will use CUDA or hipblas for the actual computation)
-RUN if [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then \
-        SKIP_GRPC_BACKEND="backend-assets/grpc/llama-cpp-avx512 backend-assets/grpc/llama-cpp-avx backend-assets/grpc/llama-cpp-avx2" make build; \
-    else \
-        make build; \
-    fi
-
-RUN if [ ! -d "/build/sources/go-piper/piper-phonemize/pi/lib/" ]; then \
-        mkdir -p /build/sources/go-piper/piper-phonemize/pi/lib/ \
-        touch /build/sources/go-piper/piper-phonemize/pi/lib/keep \
-    ; fi
+## If we're on arm64 AND using cublas/hipblas, skip some of the llama-compat backends to save space
+## Otherwise just run the normal build
+RUN make build

 ###################################
 ###################################
@@ -321,24 +359,11 @@ RUN if [ ! -d "/build/sources/go-piper/piper-phonemize/pi/lib/" ]; then \

 FROM builder-base AS devcontainer

-ARG FFMPEG
-
-COPY --from=grpc /opt/grpc /usr/local
-
 COPY .devcontainer-scripts /.devcontainer-scripts

-# Add FFmpeg
-RUN if [ "${FFMPEG}" = "true" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            ffmpeg && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
-        ssh less wget
+        ssh less
 # For the devcontainer, leave apt functional in case additional devtools are needed at runtime.

 RUN go install github.com/go-delve/delve/cmd/dlv@latest
@@ -352,101 +377,30 @@ RUN go install github.com/mikefarah/yq/v4@latest
 # If you cannot find a more suitable place for an addition, this layer is a suitable place for it.
 FROM requirements-drivers

-ARG FFMPEG
-ARG BUILD_TYPE
-ARG TARGETARCH
-ARG IMAGE_TYPE=extras
-ARG EXTRA_BACKENDS
-ARG MAKEFLAGS
-
-ENV BUILD_TYPE=${BUILD_TYPE}
-ENV REBUILD=false
 ENV HEALTHCHECK_ENDPOINT=http://localhost:8080/readyz
-ENV MAKEFLAGS=${MAKEFLAGS}

 ARG CUDA_MAJOR_VERSION=12
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 ENV NVIDIA_REQUIRE_CUDA="cuda>=${CUDA_MAJOR_VERSION}.0"
 ENV NVIDIA_VISIBLE_DEVICES=all

-# Add FFmpeg
-RUN if [ "${FFMPEG}" = "true" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            ffmpeg && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
+WORKDIR /

-WORKDIR /build
-
-# we start fresh & re-copy all assets because `make build` does not clean up nicely after itself
-# so when `entrypoint.sh` runs `make build` again (which it does by default), the build would fail
-# see https://github.com/go-skynet/LocalAI/pull/658#discussion_r1241971626 and
-# https://github.com/go-skynet/LocalAI/pull/434
-COPY . .
-
-COPY --from=builder /build/sources ./sources/
-COPY --from=grpc /opt/grpc /usr/local
-
-RUN make prepare-sources
+COPY ./entrypoint.sh .

 # Copy the binary
 COPY --from=builder /build/local-ai ./
-
-# Copy shared libraries for piper
-COPY --from=builder /build/sources/go-piper/piper-phonemize/pi/lib/* /usr/lib/
-
-# Change the shell to bash so we can use [[ tests below
-SHELL ["/bin/bash", "-c"]
-# We try to strike a balance between individual layer size (as that affects total push time) and total image size
-# Splitting the backends into more groups with fewer items results in a larger image, but a smaller size for the largest layer
-# Splitting the backends into fewer groups with more items results in a smaller image, but a larger size for the largest layer
-
-RUN if [[ ( "${IMAGE_TYPE}" == "extras ")]]; then \
-        apt-get -qq -y install espeak-ng \
-    ; fi
-
-RUN if [[ ( "${EXTRA_BACKENDS}" =~ "coqui" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/coqui \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "faster-whisper" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/faster-whisper \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "diffusers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/diffusers \
-    ; fi
-
-RUN if [[ ( "${EXTRA_BACKENDS}" =~ "kokoro" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/kokoro \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "exllama2" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/exllama2 \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "transformers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/transformers \
-    ; fi
-
-RUN if [[ ( "${EXTRA_BACKENDS}" =~ "vllm" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/vllm \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "autogptq" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/autogptq \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "bark" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/bark \
-    ; fi && \
-    if [[ ( "${EXTRA_BACKENDS}" =~ "rerankers" || -z "${EXTRA_BACKENDS}" ) && "$IMAGE_TYPE" == "extras" ]]; then \
-        make -C backend/python/rerankers \
-    ; fi
+# Copy the opus shim if it was built
+RUN --mount=from=builder,src=/build/,dst=/mnt/build \
+    if [ -f /mnt/build/libopusshim.so ]; then cp /mnt/build/libopusshim.so ./; fi

 # Make sure the models directory exists
-RUN mkdir -p /build/models
+RUN mkdir -p /models /backends /data

 # Define the health check command
 HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
  CMD curl -f ${HEALTHCHECK_ENDPOINT} || exit 1

-VOLUME /build/models
+VOLUME /models /backends /configuration /data
 EXPOSE 8080
-ENTRYPOINT [ "/build/entrypoint.sh" ]
+ENTRYPOINT [ "/entrypoint.sh" ]
--- a/Dockerfile.aio
+++ b/Dockerfile.aio
@@ -1,8 +0,0 @@
-ARG BASE_IMAGE=ubuntu:22.04
-
-FROM ${BASE_IMAGE} 
-
-RUN apt-get update && apt-get install -y pciutils && apt-get clean
-
-COPY aio/ /aio
-ENTRYPOINT [ "/aio/entrypoint.sh" ]
--- a/5
+++ b/5
@@ -1,5 +0,0 @@
-VERSION 0.7
-
-build:
-    FROM DOCKERFILE -f Dockerfile .
-    SAVE ARTIFACT /usr/bin/local-ai AS LOCAL local-ai
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2023-2024 Ettore Di Giacinto (mudler@localai.io)
+Copyright (c) 2023-2025 Ettore Di Giacinto (mudler@localai.io)

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/1650
+++ b/1650
--- a/README.md
+++ b/README.md
@@ -1,40 +1,27 @@
 <h1 align="center">
  <br>
-  <img height="300" src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"> <br>
-    LocalAI
+  <img width="300" src="./core/http/static/logo.png"> <br>
 <br>
 </h1>

 <p align="center">
-<a href="https://github.com/go-skynet/LocalAI/fork" target="blank">
-<img src="https://img.shields.io/github/forks/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI forks"/>
-</a>
 <a href="https://github.com/go-skynet/LocalAI/stargazers" target="blank">
 <img src="https://img.shields.io/github/stars/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI stars"/>
 </a>
-<a href="https://github.com/go-skynet/LocalAI/pulls" target="blank">
-<img src="https://img.shields.io/github/issues-pr/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI pull-requests"/>
-</a>
 <a href='https://github.com/go-skynet/LocalAI/releases'>
 <img src='https://img.shields.io/github/release/go-skynet/LocalAI?&label=Latest&style=for-the-badge'>
 </a>
-</p>
-
-<p align="center">
-<a href="https://hub.docker.com/r/localai/localai" target="blank">
-<img src="https://img.shields.io/badge/dockerhub-images-important.svg?logo=Docker" alt="LocalAI Docker hub"/>
-</a>
-<a href="https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest" target="blank">
-<img src="https://img.shields.io/badge/quay.io-images-important.svg?" alt="LocalAI Quay.io"/>
+<a href="LICENSE" target="blank">
+<img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" alt="LocalAI License"/>
 </a>
 </p>

 <p align="center">
 <a href="https://twitter.com/LocalAI_API" target="blank">
-<img src="https://img.shields.io/twitter/follow/LocalAI_API?label=Follow: LocalAI_API&style=social" alt="Follow LocalAI_API"/>
+<img src="https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white&label=LocalAI_API" alt="Follow LocalAI_API"/>
 </a>
 <a href="https://discord.gg/uJAeKSAGDy" target="blank">
-<img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/>
+<img src="https://img.shields.io/badge/dynamic/json?color=blue&label=Discord&style=for-the-badge&query=approximate_member_count&url=https%3A%2F%2Fdiscordapp.com%2Fapi%2Finvites%2FuJAeKSAGDy%3Fwith_counts%3Dtrue&logo=discord" alt="Join LocalAI Discord Community"/>
 </a>
 </p>

@@ -42,151 +29,186 @@
 <a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
 </p>

-> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
->
-> [💻 Quickstart](https://localai.io/basics/getting_started/) [🖼️ Models](https://models.localai.io/) [🚀 Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) [🥽 Demo](https://demo.localai.io) [🌍 Explorer](https://explorer.localai.io) [🛫 Examples](https://github.com/mudler/LocalAI-examples) 
+**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

-[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)
+- **Drop-in API compatibility** — OpenAI, Anthropic, ElevenLabs APIs
+- **36+ backends** — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
+- **Any hardware** — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
+- **Multi-user ready** — API key auth, user quotas, role-based access
+- **Built-in AI agents** — autonomous agents with tool use, RAG, MCP, and skills
+- **Privacy-first** — your data never leaves your infrastructure

-**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
+Created by [Ettore Di Giacinto](https://github.com/mudler) and maintained by the [LocalAI team](#team).

-![screen](https://github.com/mudler/LocalAI/assets/2420543/20b5ccd2-8393-44f0-aaf6-87a23806381e)
+> [:book: Documentation](https://localai.io/) | [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) | [💻 Quickstart](https://localai.io/basics/getting_started/) | [🖼️ Models](https://models.localai.io/) | [❓FAQ](https://localai.io/faq/)

-Run the installer script:
+## Guided tour
+
+https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
+
+<details>
+
+<summary>
+Click to see more!
+</summary>
+
+#### User and auth
+
+https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c
+
+#### Agents
+
+https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
+
+#### Usage metrics per user
+
+https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f
+
+#### Fine-tuning and Quantization
+
+https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee
+
+#### WebRTC
+
+https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b
+
+</details>
+
+## Quickstart
+
+### macOS
+
+<a href="https://github.com/mudler/LocalAI/releases/latest/download/LocalAI.dmg">
+  <img src="https://img.shields.io/badge/Download-macOS-blue?style=for-the-badge&logo=apple&logoColor=white" alt="Download LocalAI for macOS"/>
+</a>
+
+> **Note:** The DMG is not signed by Apple. After installing, run: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`. See [#6268](https://github.com/mudler/LocalAI/issues/6268) for details.
+
+### Containers (Docker, podman, ...)
+
+> Already ran LocalAI before? Use `docker start -i local-ai` to restart an existing container.
+
+#### CPU only:

 ```bash
-curl https://localai.io/install.sh | sh
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
 ```

-Or run with docker:
-```bash
-# CPU only image:
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
+#### NVIDIA GPU:

-# Nvidia GPU:
+```bash
+# CUDA 13
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
+
+# CUDA 12
 docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

-# CPU and GPU image (bigger size):
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
+# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

-# AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
+# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
+docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
 ```

-To load models:
+#### AMD GPU (ROCm):

 ```bash
-# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
+docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
+```
+
+#### Intel GPU (oneAPI):
+
+```bash
+docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
+```
+
+#### Vulkan GPU:
+
+```bash
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
+```
+
+### Loading models
+
+```bash
+# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
 local-ai run llama-3.2-1b-instruct:q4_k_m
-# Start LocalAI with the phi-2 model directly from huggingface
+# From Huggingface
 local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
-# Install and run a model from the Ollama OCI registry
+# From the Ollama OCI registry
 local-ai run ollama://gemma:2b
-# Run a model from a configuration file
+# From a YAML config
 local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
-# Install and run a model from a standard OCI registry (e.g., Docker Hub)
+# From a standard OCI registry (e.g., Docker Hub)
 local-ai run oci://localai/phi-2:latest
 ```

-[💻 Getting started](https://localai.io/basics/getting_started/index.html)
+> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).

-## 📰 Latest project news
+For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).

- Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (**VAD**) added to the API: https://github.com/mudler/LocalAI/pull/4204
- Oct 2024: examples moved to [LocalAI-examples](https://github.com/mudler/LocalAI-examples)
- Aug 2024:  🆕 FLUX-1, [P2P Explorer](https://explorer.localai.io)
- July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- May 2024: 🔥🔥 Decentralized P2P llama.cpp:  https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs  https://localai.io/features/distribute/
- May 2024: 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
+## Latest News

-Roadmap items: [List of issues](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)
+- **April 2026**: [Voice recognition](https://github.com/mudler/LocalAI/pull/9500), [Face recognition, identification & liveness detection](https://github.com/mudler/LocalAI/pull/9480), [Ollama API compatibility](https://github.com/mudler/LocalAI/pull/9284), [Video generation in stable-diffusion.ggml](https://github.com/mudler/LocalAI/pull/9420), [Backend versioning with auto-upgrade](https://github.com/mudler/LocalAI/pull/9315), [Pin models & load-on-demand toggle](https://github.com/mudler/LocalAI/pull/9309), [Universal model importer](https://github.com/mudler/LocalAI/pull/9466), new backends: [sglang](https://github.com/mudler/LocalAI/pull/9359), [ik-llama-cpp](https://github.com/mudler/LocalAI/pull/9326), [TurboQuant](https://github.com/mudler/LocalAI/pull/9355), [sam.cpp](https://github.com/mudler/LocalAI/pull/9288), [Kokoros](https://github.com/mudler/LocalAI/pull/9212), [qwen3tts.cpp](https://github.com/mudler/LocalAI/pull/9316), [tinygrad multimodal](https://github.com/mudler/LocalAI/pull/9364)
+- **March 2026**: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790), [MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
+- **February 2026**: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
+- **January 2026**: **LocalAI 3.10.0** — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0)
+- **December 2025**: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic multi-GPU model fitting (llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
+- **November 2025**: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245), [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
+- **October 2025**: [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support for agentic capabilities
+- **September 2025**: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
+- **August 2025**: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
+- **July 2025**: All backends migrated outside the main binary — [lightweight, modular architecture](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)

-## 🔥🔥 Hot topics (looking for help):
+For older news and full release notes, see [GitHub Releases](https://github.com/mudler/LocalAI/releases) and the [News page](https://localai.io/basics/news/).

- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
+## Features

-If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
+- [Text generation](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [and more](https://localai.io/model-compatibility/))
+- [Text to Audio](https://localai.io/features/text-to-audio/)
+- [Audio to Text](https://localai.io/features/audio-to-text/)
+- [Image generation](https://localai.io/features/image-generation)
+- [OpenAI-compatible tools API](https://localai.io/features/openai-functions/)
+- [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
+- [Embeddings generation](https://localai.io/features/embeddings/)
+- [Constrained grammars](https://localai.io/features/constrained_grammars/)
+- [Download models from Huggingface](https://localai.io/models/)
+- [Vision API](https://localai.io/features/gpt-vision/)
+- [Object Detection](https://localai.io/features/object-detection/)
+- [Reranker API](https://localai.io/features/reranker/)
+- [P2P Inferencing](https://localai.io/features/distribute/)
+- [Distributed Mode](https://localai.io/features/distributed-mode/) — Horizontal scaling with PostgreSQL + NATS
+- [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/)
+- [Built-in Agents](https://localai.io/features/agents/) — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and [Agent Hub](https://agenthub.localai.io)
+- [Backend Gallery](https://localai.io/backends/) — Install/remove backends on the fly via OCI images
+- Voice Activity Detection (Silero-VAD)
+- Integrated WebUI

-## 🚀 [Features](https://localai.io/features/)
+## Supported Backends & Acceleration

- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
- 🎨 [Image generation](https://localai.io/features/image-generation)
- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/) 
- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
- 🥽 [Vision API](https://localai.io/features/gpt-vision/)
- 📈 [Reranker API](https://localai.io/features/reranker/)
- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/)
- 🔊 Voice activity detection (Silero-VAD support)
- 🌍 Integrated WebUI!
+LocalAI supports **36+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).

-## 💻 Usage
+See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).

-Check out the [Getting started](https://localai.io/basics/getting_started/index.html) section in our documentation.
+## Resources

-### 🔗 Community and integrations
+- [Documentation](https://localai.io/)
+- [LLM fine-tuning guide](https://localai.io/docs/advanced/fine-tuning/)
+- [Build from source](https://localai.io/basics/build/)
+- [Kubernetes installation](https://localai.io/basics/getting_started/#run-localai-in-kubernetes)
+- [Integrations & community projects](https://localai.io/docs/integrations/)
+- [Installation video walkthrough](https://www.youtube.com/watch?v=cMVNnlqwfw4)
+- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
+- [Examples](https://github.com/mudler/LocalAI-examples)

-Build and deploy custom containers:
- https://github.com/sozercan/aikit
+## Team

-WebUIs:
- https://github.com/Jirubizu/localai-admin
- https://github.com/go-skynet/LocalAI-frontend
- QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot
+LocalAI is maintained by a small team of humans, together with the wider community of contributors.

-Model galleries
- https://github.com/go-skynet/model-gallery
+- **[Ettore Di Giacinto](https://github.com/mudler)** — original author and project lead
+- **[Richard Palethorpe](https://github.com/richiejp)** — maintainer

-Other:
- Helm chart https://github.com/go-skynet/helm-charts
- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
- Langchain: https://python.langchain.com/docs/integrations/providers/localai/
- Terminal utility https://github.com/djcopley/ShellOracle
- Local Smart assistant https://github.com/mudler/LocalAGI
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
- Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
- Another Telegram Bot https://github.com/JackBekket/Hellper
- Auto-documentation https://github.com/JackBekket/Reflexia
- Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper
- Github Actions: https://github.com/marketplace/actions/start-localai
- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
-  
-
-### 🔗 Resources
-
- [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [How to build locally](https://localai.io/basics/build/index.html)
- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
- [Projects integrating LocalAI](https://localai.io/docs/integrations/)
- [How tos section](https://io.midori-ai.xyz/howtos/) (curated by our community)
-
-## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)
-
- [Run Visual studio code with LocalAI (SUSE)](https://www.suse.com/c/running-ai-locally/)
- 🆕 [Run LocalAI on Jetson Nano Devkit](https://mudler.pm/posts/local-ai-jetson-nano-devkit/)
- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/blog/low-code-llm-apps-with-local-ai-flowise-and-pulumi/)
- [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)
+A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in [Discord](https://discord.gg/uJAeKSAGDy) — LocalAI is a community-driven project and wouldn't exist without you. See the full [contributors list](https://github.com/mudler/LocalAI/graphs/contributors).

 ## Citation

@@ -202,7 +224,7 @@ If you utilize this repository, data in a downstream project, please consider ci
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},
 ```

-## ❤️ Sponsors
+## Sponsors

 > Do you find LocalAI useful?

@@ -212,24 +234,28 @@ A huge thank you to our generous sponsors who support this project covering CI e

 <p align="center">
  <a href="https://www.spectrocloud.com/" target="blank">
-    <img height="200" src="https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512">
+    <img height="200" src="https://github.com/user-attachments/assets/72eab1dd-8b93-4fc0-9ade-84db49f24962">
  </a>
  <a href="https://www.premai.io/" target="blank">
    <img height="200" src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>
  </a>
 </p>

-## 🌟 Star history
+### Individual sponsors
+
+A special thanks to individual sponsors, a full list is on [GitHub](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler). Special shout out to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
+
+## Star history

 [![LocalAI Star history Chart](https://api.star-history.com/svg?repos=go-skynet/LocalAI&type=Date)](https://star-history.com/#go-skynet/LocalAI&Date)

-## 📖 License
+## License

-LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).
+LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/) and maintained by the [LocalAI team](#team).

 MIT - Author Ettore Di Giacinto <mudler@localai.io>

-## 🙇 Acknowledgements
+## Acknowledgements

 LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

@@ -240,10 +266,11 @@ LocalAI couldn't have been built without the help of great software already avai
 - https://github.com/EdVince/Stable-Diffusion-NCNN
 - https://github.com/ggerganov/whisper.cpp
 - https://github.com/rhasspy/piper
+- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation

-## 🤗 Contributors
+## Contributors

-This is a community project, a special thanks to our contributors! 🤗
+This is a community project, a special thanks to our contributors!
 <a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=go-skynet/LocalAI" />
 </a>
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -8,10 +8,24 @@ At LocalAI, we take the security of our software seriously. We understand the im

 We provide support and updates for certain versions of our software. The following table outlines which versions are currently supported with security updates:

-| Version | Supported          |
-| ------- | ------------------ |
-| > 2.0   | :white_check_mark: |
-| < 2.0   | :x:                |
+| Version Series | Support Level | Details |
+| -------------- | ------------- | ------- |
+| 3.x | :white_check_mark: Actively supported | Full security updates and bug fixes for the latest minor versions. |
+| 2.x | :warning: Security fixes only | Critical security patches only, until **December 31, 2025**. |
+| 1.x | :x: End-of-life (EOL) | No longer supported as of **January 1, 2024**. No security fixes will be provided. |
+
+### What each support level means
+
+- **Actively supported (3.x):** Receives all security updates, bug fixes, and new features. Users should stay on the latest 3.x minor release for the best protection.
+- **Security fixes only (2.x):** Receives only critical security patches (e.g., remote code execution, authentication bypass, data exposure). No bug fixes or new features. Support ends December 31, 2025.
+- **End-of-life (1.x):** No updates of any kind. Users on 1.x are strongly encouraged to upgrade immediately, as known vulnerabilities will not be patched.
+
+### Migrating from older versions
+
+If you are running an unsupported or soon-to-be-unsupported version, we recommend upgrading as soon as possible:
+
+- **From 1.x to 3.x:** Version 1.x reached end-of-life on January 1, 2024. Review the [release notes](https://github.com/mudler/LocalAI/releases) for breaking changes across major versions, and upgrade directly to the latest 3.x release.
+- **From 2.x to 3.x:** While 2.x still receives critical security patches until December 31, 2025, we recommend planning your migration to 3.x to benefit from ongoing improvements and full support.

 Please ensure that you are using a supported version to receive the latest security updates.

--- a/Show More
+++ b/Show More