Revert "Revert #1963 (#2056 )"

This reverts commit af9e5a2d05.
2026-07-04 21:37:02 -04:00 · 2024-04-17 23:36:17 +02:00
1900 changed files with 58699 additions and 523978 deletions
--- a/.agents/adding-backends.md
+++ b/.agents/adding-backends.md
@@ -1,143 +0,0 @@
-# Adding a New Backend
-
-When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
-
-## 1. Create Backend Directory Structure
-
-Create the backend directory under the appropriate location:
- **Python backends**: `backend/python/<backend-name>/`
- **Go backends**: `backend/go/<backend-name>/`
- **C++ backends**: `backend/cpp/<backend-name>/`
-
-For Python backends, you'll typically need:
- `backend.py` - Main gRPC server implementation
- `Makefile` - Build configuration
- `install.sh` - Installation script for dependencies
- `protogen.sh` - Protocol buffer generation script
- `requirements.txt` - Python dependencies
- `run.sh` - Runtime script
- `test.py` / `test.sh` - Test files
-
-## 2. Add Build Configurations to `.github/workflows/backend.yml`
-
-Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., `chatterbox`, `faster-whisper`) for reference.
-
-**Placement in file:**
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
-
-**Additional build types you may need:**
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:6.4.4"`
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
-
-## 3. Add Backend Metadata to `backend/index.yaml`
-
-**Step 3a: Add Meta Definition**
-
-Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
-
-**Step 3b: Add Image Entries**
-
-Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
-
-## 4. Update the Makefile
-
-The Makefile needs to be updated in several places to support building and testing the new backend:
-
-**Step 4a: Add to `.NOTPARALLEL`**
-
-Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
-
-```makefile
-.NOTPARALLEL: ... backends/<backend-name>
-```
-
-**Step 4b: Add to `prepare-test-extra`**
-
-Add the backend to the `prepare-test-extra` target (around line 312) to prepare it for testing:
-
-```makefile
-prepare-test-extra: protogen-python
-	...
-	$(MAKE) -C backend/python/<backend-name>
-```
-
-**Step 4c: Add to `test-extra`**
-
-Add the backend to the `test-extra` target (around line 319) to run its tests:
-
-```makefile
-test-extra: prepare-test-extra
-	...
-	$(MAKE) -C backend/python/<backend-name> test
-```
-
-**Step 4d: Add Backend Definition**
-
-Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
-
-**For Python backends with root context** (like `faster-whisper`, `coqui`):
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
-```
-
-**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
-```
-
-**For Go backends**:
-```makefile
-BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
-```
-
-**Step 4e: Generate Docker Build Target**
-
-Add an eval call to generate the docker-build target (around line 480-501):
-
-```makefile
-$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
-```
-
-**Step 4f: Add to `docker-build-backends`**
-
-Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
-
-```makefile
-docker-build-backends: ... docker-build-<backend-name>
-```
-
-**Determining the Context:**
-
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
- Check similar backends to determine the correct context
-
-## 5. Verification Checklist
-
-After adding a new backend, verify:
-
- [ ] Backend directory structure is complete with all necessary files
- [ ] Build configurations added to `.github/workflows/backend.yml` for all desired platforms
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
- [ ] Tag suffixes match between workflow file and index.yaml
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
- [ ] No YAML syntax errors (check with linter)
- [ ] No Makefile syntax errors (check with linter)
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
-
-## 6. Example: Adding a Python Backend
-
-For reference, when `moonshine` was added:
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
- **Makefile updates**:
-  - Added to `.NOTPARALLEL` line
-  - Added to `prepare-test-extra` and `test-extra` targets
-  - Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
-  - Added eval for docker-build target generation
-  - Added `docker-build-moonshine` to `docker-build-backends`
--- a/.agents/api-endpoints-and-auth.md
+++ b/.agents/api-endpoints-and-auth.md
@@ -1,259 +0,0 @@
-# API Endpoints and Authentication
-
-This guide covers how to add new API endpoints and properly integrate them with the auth/permissions system.
-
-## Architecture overview
-
-Authentication and authorization flow through three layers:
-
-1. **Global auth middleware** (`core/http/auth/middleware.go` → `auth.Middleware`) — applied to every request in `core/http/app.go`. Handles session cookies, Bearer tokens, API keys, and legacy API keys. Populates `auth_user` and `auth_role` in the Echo context.
-2. **Feature middleware** (`auth.RequireFeature`) — per-feature access control applied to route groups or individual routes. Checks if the authenticated user has the specific feature enabled.
-3. **Admin middleware** (`auth.RequireAdmin`) — restricts endpoints to admin users only.
-
-When auth is disabled (no auth DB, no legacy API keys), all middleware becomes pass-through (`auth.NoopMiddleware`).
-
-## Adding a new API endpoint
-
-### Step 1: Create the handler
-
-Write the endpoint handler in the appropriate package under `core/http/endpoints/`. Follow existing patterns:
-
-```go
-// core/http/endpoints/localai/my_feature.go
-func MyFeatureEndpoint(app *application.Application) echo.HandlerFunc {
-    return func(c echo.Context) error {
-        // Use auth.GetUser(c) to get the authenticated user (may be nil if auth is disabled)
-        user := auth.GetUser(c)
-
-        // Your logic here
-        return c.JSON(http.StatusOK, result)
-    }
-}
-```
-
-### Step 2: Register routes
-
-Add routes in the appropriate file under `core/http/routes/`. The file you use depends on the endpoint category:
-
-| File | Category |
-|------|----------|
-| `routes/openai.go` | OpenAI-compatible API endpoints (`/v1/...`) |
-| `routes/localai.go` | LocalAI-specific endpoints (`/api/...`, `/models/...`, `/backends/...`) |
-| `routes/agents.go` | Agent pool endpoints (`/api/agents/...`) |
-| `routes/auth.go` | Auth endpoints (`/api/auth/...`) |
-| `routes/ui_api.go` | UI backend API endpoints |
-
-### Step 3: Apply the right middleware
-
-Choose the appropriate protection level:
-
-#### No auth required (public)
-Exempt paths bypass auth entirely. Add to `isExemptPath()` in `middleware.go` or use the `/api/auth/` prefix (always exempt). Use sparingly — most endpoints should require auth.
-
-#### Standard auth (any authenticated user)
-The global middleware already handles this. API paths (`/api/`, `/v1/`, etc.) automatically require authentication when auth is enabled. You don't need to add any extra middleware.
-
-```go
-router.GET("/v1/my-endpoint", myHandler)  // auth enforced by global middleware
-```
-
-#### Admin only
-Pass `adminMiddleware` to the route. This is set up in `app.go` and passed to `Register*Routes` functions:
-
-```go
-// In the Register function signature, accept the middleware:
-func RegisterMyRoutes(router *echo.Echo, app *application.Application, adminMiddleware echo.MiddlewareFunc) {
-    router.POST("/models/apply", myHandler, adminMiddleware)
-}
-```
-
-#### Feature-gated
-For endpoints that should be toggleable per-user, use feature middleware. There are two approaches:
-
-**Approach A: Route-level middleware** (preferred for groups of related endpoints)
-
-```go
-// In app.go, create the feature middleware:
-myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
-
-// Pass it to the route registration function:
-routes.RegisterMyRoutes(e, app, myFeatureMw)
-
-// In the routes file, apply to a group:
-g := e.Group("/api/my-feature", myFeatureMw)
-g.GET("", listHandler)
-g.POST("", createHandler)
-```
-
-**Approach B: RouteFeatureRegistry** (preferred for individual OpenAI-compatible endpoints)
-
-Add an entry to `RouteFeatureRegistry` in `core/http/auth/features.go`. The `RequireRouteFeature` global middleware will automatically enforce it:
-
-```go
-var RouteFeatureRegistry = []RouteFeature{
-    // ... existing entries ...
-    {"POST", "/v1/my-endpoint", FeatureMyFeature},
-}
-```
-
-## Adding a new feature
-
-When you need a new toggleable feature (not just a new endpoint under an existing feature):
-
-### 1. Define the feature constant
-
-Add to `core/http/auth/permissions.go`:
-
-```go
-const (
-    // Add to the appropriate group:
-    // Agent features (default OFF for new users)
-    FeatureMyFeature = "my_feature"
-
-    // OR API features (default ON for new users)
-    FeatureMyFeature = "my_feature"
-)
-```
-
-Then add it to the appropriate slice:
-
-```go
-// Default OFF — user must be explicitly granted access:
-var AgentFeatures = []string{..., FeatureMyFeature}
-
-// Default ON — user has access unless explicitly revoked:
-var APIFeatures = []string{..., FeatureMyFeature}
-```
-
-### 2. Add feature metadata
-
-In `core/http/auth/features.go`, add to the appropriate `FeatureMetas` function so the admin UI can display it:
-
-```go
-func AgentFeatureMetas() []FeatureMeta {
-    return []FeatureMeta{
-        // ... existing ...
-        {FeatureMyFeature, "My Feature", false},  // false = default OFF
-    }
-}
-```
-
-### 3. Wire up the middleware
-
-In `core/http/app.go`:
-
-```go
-myFeatureMw := auth.RequireFeature(application.AuthDB(), auth.FeatureMyFeature)
-```
-
-Then pass it to the route registration function.
-
-### 4. Register route-feature mappings (if applicable)
-
-If your feature gates standard API endpoints (like `/v1/...`), add entries to `RouteFeatureRegistry` in `features.go` instead of using per-route middleware.
-
-## Accessing the authenticated user in handlers
-
-```go
-import "github.com/mudler/LocalAI/core/http/auth"
-
-func MyHandler(c echo.Context) error {
-    // Get the user (nil when auth is disabled or unauthenticated)
-    user := auth.GetUser(c)
-    if user == nil {
-        // Handle unauthenticated — or let middleware handle it
-    }
-
-    // Check role
-    if user.Role == auth.RoleAdmin {
-        // admin-specific logic
-    }
-
-    // Check feature access programmatically (when you need conditional behavior, not full blocking)
-    if auth.HasFeatureAccess(db, user, auth.FeatureMyFeature) {
-        // feature-specific logic
-    }
-
-    // Check model access
-    if !auth.IsModelAllowed(db, user, modelName) {
-        return c.JSON(http.StatusForbidden, ...)
-    }
-}
-```
-
-## Middleware composition patterns
-
-Middleware can be composed at different levels. Here are the patterns used in the codebase:
-
-### Group-level middleware (agents pattern)
-```go
-// All routes in the group share the middleware
-g := e.Group("/api/agents", poolReadyMw, agentsMw)
-g.GET("", listHandler)
-g.POST("", createHandler)
-```
-
-### Per-route middleware (localai pattern)
-```go
-// Individual routes get middleware as extra arguments
-router.POST("/models/apply", applyHandler, adminMiddleware)
-router.GET("/metrics", metricsHandler, adminMiddleware)
-```
-
-### Middleware slice (openai pattern)
-```go
-// Build a middleware chain for a handler
-chatMiddleware := []echo.MiddlewareFunc{
-    usageMiddleware,
-    traceMiddleware,
-    modelFilterMiddleware,
-}
-app.POST("/v1/chat/completions", chatHandler, chatMiddleware...)
-```
-
-## Error response format
-
-Always use `schema.ErrorResponse` for auth/permission errors to stay consistent with the OpenAI-compatible API:
-
-```go
-return c.JSON(http.StatusForbidden, schema.ErrorResponse{
-    Error: &schema.APIError{
-        Message: "feature not enabled for your account",
-        Code:    http.StatusForbidden,
-        Type:    "authorization_error",
-    },
-})
-```
-
-Use these HTTP status codes:
- `401 Unauthorized` — no valid credentials provided
- `403 Forbidden` — authenticated but lacking permission
- `429 Too Many Requests` — rate limited (auth endpoints)
-
-## Usage tracking
-
-If your endpoint should be tracked for usage (token counts, request counts), add the `usageMiddleware` to its middleware chain. See `core/http/middleware/usage.go` and how it's applied in `routes/openai.go`.
-
-## Path protection rules
-
-The global auth middleware classifies paths as API paths or non-API paths:
-
- **API paths** (always require auth when auth is enabled): `/api/`, `/v1/`, `/models/`, `/backends/`, `/backend/`, `/tts`, `/vad`, `/video`, `/stores/`, `/system`, `/ws/`, `/metrics`
- **Exempt paths** (never require auth): `/api/auth/` prefix, anything in `appConfig.PathWithoutAuth`
- **Non-API paths** (UI, static assets): pass through without auth — the React UI handles login redirects client-side
-
-If you add endpoints under a new top-level path prefix, add it to `isAPIPath()` in `middleware.go` to ensure it requires authentication.
-
-## Checklist
-
-When adding a new endpoint:
-
- [ ] Handler in `core/http/endpoints/`
- [ ] Route registered in appropriate `core/http/routes/` file
- [ ] Auth level chosen: public / standard / admin / feature-gated
- [ ] If feature-gated: constant in `permissions.go`, metadata in `features.go`, middleware in `app.go`
- [ ] If new path prefix: added to `isAPIPath()` in `middleware.go`
- [ ] If OpenAI-compatible: entry in `RouteFeatureRegistry`
- [ ] If token-counting: `usageMiddleware` added to middleware chain
- [ ] Error responses use `schema.ErrorResponse` format
- [ ] Tests cover both authenticated and unauthenticated access
--- a/.agents/building-and-testing.md
+++ b/.agents/building-and-testing.md
@@ -1,16 +0,0 @@
-# Build and Testing
-
-Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
-
-## Building a specified backend
-
-Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build coqui for ROCM/hipblas
-
- The Makefile has targets like `docker-build-coqui` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
-  - Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
-  - l4t and cublas also requires the CUDA major and minor version
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-coqui`
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
--- a/.agents/coding-style.md
+++ b/.agents/coding-style.md
@@ -1,52 +0,0 @@
-# Coding Style
-
-The project has the following .editorconfig:
-
-```
-root = true
-
-[*]
-indent_style = space
-indent_size = 2
-end_of_line = lf
-charset = utf-8
-trim_trailing_whitespace = true
-insert_final_newline = true
-
-[*.go]
-indent_style = tab
-
-[Makefile]
-indent_style = tab
-
-[*.proto]
-indent_size = 2
-
-[*.py]
-indent_size = 4
-
-[*.js]
-indent_size = 2
-
-[*.yaml]
-indent_size = 2
-
-[*.md]
-trim_trailing_whitespace = false
-```
-
- Use comments sparingly to explain why code does something, not what it does. Comments are there to add context that would be difficult to deduce from reading the code.
- Prefer modern Go e.g. use `any` not `interface{}`
-
-## Logging
-
-Use `github.com/mudler/xlog` for logging which has the same API as slog.
-
-## Documentation
-
-The project documentation is located in `docs/content`. When adding new features or changing existing functionality, it is crucial to update the documentation to reflect these changes. This helps users understand how to use the new capabilities and ensures the documentation stays relevant.
-
- **Feature Documentation**: If you add a new feature (like a new backend or API endpoint), create a new markdown file in `docs/content/features/` explaining what it is, how to configure it, and how to use it.
- **Configuration**: If you modify configuration options, update the relevant sections in `docs/content/`.
- **Examples**: providing concrete examples (like YAML configuration blocks) is highly encouraged to help users get started quickly.
- **Shortcodes**: Use `{{% notice note %}}`, `{{% notice tip %}}`, or `{{% notice warning %}}` for callout boxes. Do **not** use `{{% alert %}}` — that shortcode does not exist in this project's Hugo theme and will break the docs build.
--- a/.agents/debugging-backends.md
+++ b/.agents/debugging-backends.md
@@ -1,141 +0,0 @@
-# Debugging and Rebuilding Backends
-
-When a backend fails at runtime (e.g. a gRPC method error, a Python import error, or a dependency conflict), use this guide to diagnose, fix, and rebuild.
-
-## Architecture Overview
-
- **Source directory**: `backend/python/<name>/` (or `backend/go/<name>/`, `backend/cpp/<name>/`)
- **Installed directory**: `backends/<name>/` — this is what LocalAI actually runs. It is populated by `make backends/<name>` which builds a Docker image, exports it, and installs it via `local-ai backends install`.
- **Virtual environment**: `backends/<name>/venv/` — the installed Python venv (for Python backends). The Python binary is at `backends/<name>/venv/bin/python`.
-
-Editing files in `backend/python/<name>/` does **not** affect the running backend until you rebuild with `make backends/<name>`.
-
-## Diagnosing Failures
-
-### 1. Check the logs
-
-Backend gRPC processes log to LocalAI's stdout/stderr. Look for lines tagged with the backend's model ID:
-
-```
-GRPC stderr id="trl-finetune-127.0.0.1:37335" line="..."
-```
-
-Common error patterns:
- **"Method not implemented"** — the backend is missing a gRPC method that the Go side calls. The model loader (`pkg/model/initializers.go`) always calls `LoadModel` after `Health`; fine-tuning backends must implement it even as a no-op stub.
- **Python import errors / `AttributeError`** — usually a dependency version mismatch (e.g. `pyarrow` removing `PyExtensionType`).
- **"failed to load backend"** — the gRPC process crashed or never started. Check stderr lines for the traceback.
-
-### 2. Test the Python environment directly
-
-You can run the installed venv's Python to check imports without starting the full server:
-
-```bash
-backends/<name>/venv/bin/python -c "import datasets; print(datasets.__version__)"
-```
-
-If `pip` is missing from the venv, bootstrap it:
-
-```bash
-backends/<name>/venv/bin/python -m ensurepip
-```
-
-Then use `backends/<name>/venv/bin/python -m pip install ...` to test fixes in the installed venv before committing them to the source requirements.
-
-### 3. Check upstream dependency constraints
-
-When you hit a dependency conflict, check what the main library expects. For example, TRL's upstream `requirements.txt`:
-
-```
-https://github.com/huggingface/trl/blob/main/requirements.txt
-```
-
-Pin minimum versions in the backend's requirements files to match upstream.
-
-## Common Fixes
-
-### Missing gRPC methods
-
-If the Go side calls a method the backend doesn't implement (e.g. `LoadModel`), add a no-op stub in `backend.py`:
-
-```python
-def LoadModel(self, request, context):
-    """No-op — actual loading happens elsewhere."""
-    return backend_pb2.Result(success=True, message="OK")
-```
-
-The gRPC contract requires `LoadModel` to succeed for the model loader to return a usable client, even if the backend doesn't need upfront model loading.
-
-### Dependency version conflicts
-
-Python backends often break when a transitive dependency releases a breaking change (e.g. `pyarrow` removing `PyExtensionType`). Steps:
-
-1. Identify the broken import in the logs
-2. Test in the installed venv: `backends/<name>/venv/bin/python -c "import <module>"`
-3. Check upstream requirements for version constraints
-4. Update **all** requirements files in `backend/python/<name>/`:
-   - `requirements.txt` — base deps (grpcio, protobuf)
-   - `requirements-cpu.txt` — CPU-specific (includes PyTorch CPU index)
-   - `requirements-cublas12.txt` — CUDA 12
-   - `requirements-cublas13.txt` — CUDA 13
-5. Rebuild: `make backends/<name>`
-
-### PyTorch index conflicts (uv resolver)
-
-The Docker build uses `uv` for pip installs. When `--extra-index-url` points to the PyTorch wheel index, `uv` may refuse to fetch packages like `requests` from PyPI if it finds a different version on the PyTorch index first. Fix this by adding `--index-strategy=unsafe-first-match` to `install.sh`:
-
-```bash
-EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
-installRequirements
-```
-
-Most Python backends already do this — check `backend/python/transformers/install.sh` or similar for reference.
-
-## Rebuilding
-
-### Rebuild a single backend
-
-```bash
-make backends/<name>
-```
-
-This runs the Docker build (`Dockerfile.python`), exports the image to `backend-images/<name>.tar`, and installs it into `backends/<name>/`. It also rebuilds the `local-ai` Go binary (without extra tags).
-
-**Important**: If you were previously running with `GO_TAGS=auth`, the `make backends/<name>` step will overwrite your binary without that tag. Rebuild the Go binary afterward:
-
-```bash
-GO_TAGS=auth make build
-```
-
-### Rebuild and restart
-
-After rebuilding a backend, you must restart LocalAI for it to pick up the new backend files. The backend gRPC process is spawned on demand when the model is first loaded.
-
-```bash
-# Kill existing process
-kill <pid>
-
-# Restart
-./local-ai run --debug [your flags]
-```
-
-### Quick iteration (skip Docker rebuild)
-
-For fast iteration on a Python backend's `backend.py` without a full Docker rebuild, you can edit the installed copy directly:
-
-```bash
-# Edit the installed copy
-vim backends/<name>/backend.py
-
-# Restart LocalAI to respawn the gRPC process
-```
-
-This is useful for testing but **does not persist** — the next `make backends/<name>` will overwrite it. Always commit fixes to the source in `backend/python/<name>/`.
-
-## Verification
-
-After fixing and rebuilding:
-
-1. Start LocalAI and confirm the backend registers: look for `Registering backend name="<name>"` in the logs
-2. Trigger the operation that failed (e.g. start a fine-tuning job)
-3. Watch the GRPC stderr/stdout lines for the backend's model ID
-4. Confirm no errors in the traceback
--- a/.agents/llama-cpp-backend.md
+++ b/.agents/llama-cpp-backend.md
@@ -1,77 +0,0 @@
-# llama.cpp Backend
-
-The llama.cpp backend (`backend/cpp/llama-cpp/grpc-server.cpp`) is a gRPC adaptation of the upstream HTTP server (`llama.cpp/tools/server/server.cpp`). It uses the same underlying server infrastructure from `llama.cpp/tools/server/server-context.cpp`.
-
-## Building and Testing
-
- Test llama.cpp backend compilation: `make backends/llama-cpp`
- The backend is built as part of the main build process
- Check `backend/cpp/llama-cpp/Makefile` for build configuration
-
-## Architecture
-
- **grpc-server.cpp**: gRPC server implementation, adapts HTTP server patterns to gRPC
- Uses shared server infrastructure: `server-context.cpp`, `server-task.cpp`, `server-queue.cpp`, `server-common.cpp`
- The gRPC server mirrors the HTTP server's functionality but uses gRPC instead of HTTP
-
-## Common Issues When Updating llama.cpp
-
-When fixing compilation errors after upstream changes:
-1. Check how `server.cpp` (HTTP server) handles the same change
-2. Look for new public APIs or getter methods
-3. Store copies of needed data instead of accessing private members
-4. Update function calls to match new signatures
-5. Test with `make backends/llama-cpp`
-
-## Key Differences from HTTP Server
-
- gRPC uses `BackendServiceImpl` class with gRPC service methods
- HTTP server uses `server_routes` with HTTP handlers
- Both use the same `server_context` and task queue infrastructure
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
-
-## Tool Call Parsing Maintenance
-
-When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
-
-### Checking for XML Parsing Changes
-
-1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
-2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
-3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
-4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
-   - `COMMON_CHAT_FORMAT_GLM_4_5`
-   - `COMMON_CHAT_FORMAT_MINIMAX_M2`
-   - `COMMON_CHAT_FORMAT_KIMI_K2`
-   - `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
-   - `COMMON_CHAT_FORMAT_APRIEL_1_5`
-   - `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
-   - Any new formats added
-
-### Model Configuration Options
-
-Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
-
-1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
-2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
-3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
-4. **Examples of options to check**:
-   - `ctx_shift` - Context shifting support
-   - `parallel_tool_calls` - Parallel tool calling
-   - `reasoning_format` - Reasoning format options
-   - Any new flags or parameters
-
-### Implementation Guidelines
-
-1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
-2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
-3. **Documentation**: Update relevant documentation when adding new formats or options
-4. **Backward Compatibility**: Ensure changes don't break existing functionality
-
-### Files to Monitor
-
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
- `llama.cpp/common/chat.h` - Format enums and parameter structures
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options
--- a/.agents/testing-mcp-apps.md
+++ b/.agents/testing-mcp-apps.md
@@ -1,120 +0,0 @@
-# Testing MCP Apps (Interactive Tool UIs)
-
-MCP Apps is an extension to MCP where tools declare interactive HTML UIs via `_meta.ui.resourceUri`. When the LLM calls such a tool, the UI renders the app in a sandboxed iframe inline in the chat. The app communicates bidirectionally with the host via `postMessage` (JSON-RPC) and can call server tools, send messages, and update model context.
-
-Spec: https://modelcontextprotocol.io/extensions/apps/overview
-
-## Quick Start: Run a Test MCP App Server
-
-The `@modelcontextprotocol/server-basic-react` npm package is a ready-to-use test server that exposes a `get-time` tool with an interactive React clock UI. It requires Node >= 20, so run it in Docker:
-
-```bash
-docker run -d --name mcp-app-test -p 3001:3001 node:22-slim \
-  sh -c 'npx -y @modelcontextprotocol/server-basic-react'
-```
-
-Wait ~10 seconds for it to start, then verify:
-
-```bash
-# Check it's running
-docker logs mcp-app-test
-# Expected: "MCP server listening on http://localhost:3001/mcp"
-
-# Verify MCP protocol works
-curl -s -X POST http://localhost:3001/mcp \
-  -H 'Content-Type: application/json' \
-  -H 'Accept: application/json, text/event-stream' \
-  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
-
-# List tools — should show get-time with _meta.ui.resourceUri
-curl -s -X POST http://localhost:3001/mcp \
-  -H 'Content-Type: application/json' \
-  -H 'Accept: application/json, text/event-stream' \
-  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
-```
-
-The `tools/list` response should contain:
-```json
-{
-  "name": "get-time",
-  "_meta": {
-    "ui": { "resourceUri": "ui://get-time/mcp-app.html" }
-  }
-}
-```
-
-## Testing in LocalAI's UI
-
-1. Make sure LocalAI is running (e.g. `http://localhost:8080`)
-2. Build the React UI: `cd core/http/react-ui && npm install && npm run build`
-3. Open the Chat page in your browser
-4. Click **"Client MCP"** in the chat header
-5. Add a new client MCP server:
-   - **URL**: `http://localhost:3001/mcp`
-   - **Use CORS proxy**: enabled (default) — required because the browser can't hit `localhost:3001` directly due to CORS; LocalAI's proxy at `/api/cors-proxy` handles it
-6. The server should connect and discover the `get-time` tool
-7. Select a model and send: **"What time is it?"**
-8. The LLM should call the `get-time` tool
-9. The tool result should render the interactive React clock app in an iframe as a standalone chat message (not inside the collapsed activity group)
-
-## What to Verify
-
- [ ] Tool appears in the connected tools list (not filtered — `get-time` is callable by the LLM)
- [ ] The iframe renders as a standalone chat message with a puzzle-piece icon
- [ ] The app loads and is interactive (clock UI, buttons work)
- [ ] No "Reconnect to MCP server" overlay (connection is live)
- [ ] Console logs show bidirectional communication:
-  - `tools/call` messages from app to host (app calling server tools)
-  - `ui/message` notifications (app sending messages)
- [ ] After the app renders, the LLM continues and produces a text response with the time
- [ ] Non-UI tools continue to work normally (text-only results)
- [ ] Page reload shows the HTML statically with a reconnect overlay until you reconnect
-
-## Console Log Patterns
-
-Healthy bidirectional communication looks like:
-
-```
-Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Bridge init
-get-time result: { content: [...] }                          // Tool result received
-Calling get-time tool...                                     // App calls tool
-Sending message { method: "tools/call", ... }                // App -> host -> server
-Parsed message { jsonrpc: "2.0", id: N, result: {...} }     // Server response
-Sending message text to Host: ...                            // App sends message
-Sending message { method: "ui/message", ... }                // Message notification
-Message accepted                                             // Host acknowledged
-```
-
-Benign warnings to ignore:
- `Source map error: ... about:srcdoc` — browser devtools can't find source maps for srcdoc iframes
- `Ignoring message from unknown source` — duplicate postMessage from iframe navigation
- `notifications/cancelled` — app cleaning up previous requests
-
-## Architecture Notes
-
- **No server-side changes needed** — the MCP App protocol runs entirely in the browser
- `PostMessageTransport` wraps `window.postMessage` between host and `srcdoc` iframe
- `AppBridge` (from `@modelcontextprotocol/ext-apps`) auto-forwards `tools/call`, `resources/read`, `resources/list` from the app to the MCP server via the host's `Client`
- The iframe uses `sandbox="allow-scripts allow-forms"` (no `allow-same-origin`) — opaque origin, no access to host cookies/DOM/localStorage
- App-only tools (`_meta.ui.visibility: "app-only"`) are filtered from the LLM's tool list but remain callable by the app iframe
-
-## Key Files
-
- `core/http/react-ui/src/components/MCPAppFrame.jsx` — iframe + AppBridge component
- `core/http/react-ui/src/hooks/useMCPClient.js` — MCP client hook with app UI helpers (`hasAppUI`, `getAppResource`, `getClientForTool`, `getToolDefinition`)
- `core/http/react-ui/src/hooks/useChat.js` — agentic loop, attaches `appUI` to tool_result messages
- `core/http/react-ui/src/pages/Chat.jsx` — renders MCPAppFrame as standalone chat messages
-
-## Other Test Servers
-
-The `@modelcontextprotocol/ext-apps` repo has many example servers:
- `@modelcontextprotocol/server-basic-react` — simple clock (React)
- More examples at https://github.com/modelcontextprotocol/ext-apps/tree/main/examples
-
-All examples support both stdio and HTTP transport. Run without `--stdio` for HTTP mode on port 3001.
-
-## Cleanup
-
-```bash
-docker rm -f mcp-app-test
-```
--- a/.air.toml
+++ b/.air.toml
@@ -1,8 +0,0 @@
-# .air.toml
-[build]
-cmd = "make build"
-bin = "./local-ai"
-args_bin = [ "--debug" ]
-include_ext = ["go", "html", "yaml", "toml", "json", "txt", "md"]
-exclude_dir = ["pkg/grpc/proto"]
-delay = 1000
--- a/.devcontainer-scripts/postcreate.sh
+++ b/.devcontainer-scripts/postcreate.sh
@@ -1,17 +0,0 @@
-#!/bin/bash
-
-cd /workspace
-
-# Get the files into the volume without a bind mount
-if [ ! -d ".git" ]; then
-    git clone https://github.com/mudler/LocalAI.git .
-else
-    git fetch
-fi
-
-echo "Standard Post-Create script completed."
-
-if [ -f "/devcontainer-customization/postcreate.sh" ]; then
-    echo "Launching customization postcreate.sh"
-    bash "/devcontainer-customization/postcreate.sh"
-fi
--- a/.devcontainer-scripts/poststart.sh
+++ b/.devcontainer-scripts/poststart.sh
@@ -1,13 +0,0 @@
-#!/bin/bash
-
-cd /workspace
-
-# Ensures generated source files are present upon load
-make prepare
-
-echo "Standard Post-Start script completed."
-
-if [ -f "/devcontainer-customization/poststart.sh" ]; then
-    echo "Launching customization poststart.sh"
-    bash "/devcontainer-customization/poststart.sh"
-fi
--- a/.devcontainer-scripts/utils.sh
+++ b/.devcontainer-scripts/utils.sh
@@ -1,55 +0,0 @@
-#!/bin/bash
-
-# This file contains some really simple functions that are useful when building up customization scripts.
-
-
-# Checks if the git config has a user registered - and sets it up if not.
-#
-# Param 1: name
-# Param 2: email
-#
-config_user() {
-    echo "Configuring git for $1 <$2>"
-    local gcn=$(git config --global user.name)
-    if [ -z "${gcn}" ]; then
-        echo "Setting up git user / remote"
-        git config --global user.name "$1"
-        git config --global user.email "$2"
-        
-    fi
-}
-
-# Checks if the git remote is configured - and sets it up if not. Fetches either way.
-#
-# Param 1: remote name
-# Param 2: remote url
-#
-config_remote() {
-    echo "Adding git remote and fetching $2 as $1"
-    local gr=$(git remote -v | grep $1)
-    if [ -z "${gr}" ]; then
-        git remote add $1 $2
-    fi
-    git fetch $1
-}
-
-# Setup special .ssh files
-# Prints out lines of text to make things pretty
-# Param 1: bash array, filenames relative to the customization directory that should be copied to ~/.ssh
-setup_ssh() {
-    echo "starting ~/.ssh directory setup..."
-    mkdir -p "${HOME}.ssh"
-    chmod 0700 "${HOME}/.ssh"
-    echo "-----"
-    local files=("$@")
-    for file in "${files[@]}" ; do
-        local cfile="/devcontainer-customization/${file}"
-        local hfile="${HOME}/.ssh/${file}"
-        if [ ! -f "${hfile}" ]; then
-            echo "copying \"${file}\""
-            cp "${cfile}" "${hfile}"
-            chmod 600 "${hfile}"
-        fi
-    done
-    echo "~/.ssh directory setup complete!"
-}
--- a/.devcontainer/customization/README.md
+++ b/.devcontainer/customization/README.md
@@ -1,25 +0,0 @@
-Place any additional resources your environment requires in this directory
-
-Script hooks are currently called for:
-`postcreate.sh` and `poststart.sh`
-
-If files with those names exist here, they will be called at the end of the normal script.
-
-This is a good place to set things like `git config --global user.name` are set - and to handle any other files that are mounted via this directory.
-
-To assist in doing so, `source /.devcontainer-scripts/utils.sh` will provide utility functions that may be useful - for example:
-
-```
-#!/bin/bash
-
-source "/.devcontainer-scripts/utils.sh"
-
-sshfiles=("config", "key.pub")
-
-setup_ssh "${sshfiles[@]}"
-
-config_user "YOUR NAME" "YOUR EMAIL"
-
-config_remote "REMOTE NAME" "REMOTE URL"
-
-```
--- a/.devcontainer/devcontainer.json
+++ b/.devcontainer/devcontainer.json
@@ -1,24 +0,0 @@
-{
-    "$schema": "https://raw.githubusercontent.com/devcontainers/spec/main/schemas/devContainer.schema.json",
-    "name": "LocalAI",
-    "workspaceFolder": "/workspace",
-    "dockerComposeFile": [ "./docker-compose-devcontainer.yml" ],
-    "service": "api",
-    "shutdownAction": "stopCompose",
-    "customizations": {
-        "vscode": {
-            "extensions": [
-                "golang.go",
-                "ms-vscode.makefile-tools",
-                "ms-azuretools.vscode-docker",
-                "ms-python.python",
-                "ms-python.debugpy",
-                "wayou.vscode-todo-highlight",
-                "waderyan.gitblame"
-            ]
-        }
-    },
-    "forwardPorts": [8080, 3000],
-    "postCreateCommand": "bash /.devcontainer-scripts/postcreate.sh",
-    "postStartCommand": "bash /.devcontainer-scripts/poststart.sh"
-}
--- a/.devcontainer/docker-compose-devcontainer.yml
+++ b/.devcontainer/docker-compose-devcontainer.yml
@@ -1,48 +0,0 @@
-services:
-  api:
-    build:
-      context: ..
-      dockerfile: Dockerfile
-      target: devcontainer
-    env_file:
-      - ../.env
-    ports:
-      - 8080:8080
-    volumes:
-      - localai_workspace:/workspace
-      - models:/host-models
-      - backends:/host-backends
-      - ./customization:/devcontainer-customization
-    command: /bin/sh -c "while sleep 1000; do :; done"
-    cap_add:
-      - SYS_PTRACE
-    security_opt:
-      - seccomp:unconfined
-  prometheus:
-    image: prom/prometheus
-    container_name: prometheus
-    command:
-      - '--config.file=/etc/prometheus/prometheus.yml'
-    ports:
-      - 9090:9090
-    restart: unless-stopped
-    volumes:
-      - ./prometheus:/etc/prometheus
-      - prom_data:/prometheus
-  grafana:
-    image: grafana/grafana
-    container_name: grafana
-    ports:
-      - 3000:3000
-    restart: unless-stopped
-    environment:
-      - GF_SECURITY_ADMIN_USER=admin
-      - GF_SECURITY_ADMIN_PASSWORD=grafana
-    volumes:
-      - ./grafana:/etc/grafana/provisioning/datasources
-
-volumes:
-  prom_data:
-  localai_workspace:
-  models:
-  backends:
--- a/.devcontainer/grafana/datasource.yml
+++ b/.devcontainer/grafana/datasource.yml
@@ -1,10 +0,0 @@
-
-apiVersion: 1
-
-datasources:
- name: Prometheus
-  type: prometheus
-  url: http://prometheus:9090 
-  isDefault: true
-  access: proxy
-  editable: true
--- a/.devcontainer/prometheus/prometheus.yml
+++ b/.devcontainer/prometheus/prometheus.yml
@@ -1,21 +0,0 @@
-global:
-  scrape_interval: 15s
-  scrape_timeout: 10s
-  evaluation_interval: 15s
-alerting:
-  alertmanagers:
-  - static_configs:
-    - targets: []
-    scheme: http
-    timeout: 10s
-    api_version: v1
-scrape_configs:
- job_name: prometheus
-  honor_timestamps: true
-  scrape_interval: 15s
-  scrape_timeout: 10s
-  metrics_path: /metrics
-  scheme: http
-  static_configs:
-  - targets:
-    - localhost:9090
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,23 +1,8 @@
 .idea
 .github
 .vscode
-.devcontainer
 models
-backends
 examples/chatbot-ui/models
-backend/go/image/stablediffusion-ggml/build/
-backend/go/*/build
-backend/go/*/.cache
-backend/go/*/sources
-backend/go/*/package
 examples/rwkv/models
 examples/**/models
-Dockerfile*
-__pycache__
-
-# SonarQube
-.scannerwork
-
-# backend virtual environments
-**/venv
-backend/python/**/source
+Dockerfile*
--- a/.env
+++ b/.env
@@ -10,7 +10,7 @@
 #
 ## Define galleries.
 ## models will to install will be visible in `/models/available`
-# LOCALAI_GALLERIES=[{"name":"localai", "url":"github:mudler/LocalAI/gallery/index.yaml@master"}]
+# LOCALAI_GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}]

 ## CORS settings
 # LOCALAI_CORS=true
@@ -26,14 +26,24 @@
 ## Disables COMPEL (Diffusers)
 # COMPEL=0

-## Disables SD_EMBED (Diffusers)
-# SD_EMBED=0
-
 ## Enable/Disable single backend (useful if only one GPU is available)
 # LOCALAI_SINGLE_ACTIVE_BACKEND=true

-# Forces shutdown of the backends if busy (only if LOCALAI_SINGLE_ACTIVE_BACKEND is set)
-# LOCALAI_FORCE_BACKEND_SHUTDOWN=true
+## Specify a build type. Available: cublas, openblas, clblas.
+## cuBLAS: This is a GPU-accelerated version of the complete standard BLAS (Basic Linear Algebra Subprograms) library. It's provided by Nvidia and is part of their CUDA toolkit.
+## OpenBLAS: This is an open-source implementation of the BLAS library that aims to provide highly optimized code for various platforms. It includes support for multi-threading and can be compiled to use hardware-specific features for additional performance. OpenBLAS can run on many kinds of hardware, including CPUs from Intel, AMD, and ARM.
+## clBLAS:   This is an open-source implementation of the BLAS library that uses OpenCL, a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. clBLAS is designed to take advantage of the parallel computing power of GPUs but can also run on any hardware that supports OpenCL. This includes hardware from different vendors like Nvidia, AMD, and Intel.
+# BUILD_TYPE=openblas
+
+## Uncomment and set to true to enable rebuilding from source
+# REBUILD=true
+
+## Enable go tags, available: stablediffusion, tts
+## stablediffusion: image generation with stablediffusion
+## tts: enables text-to-speech with go-piper 
+## (requires REBUILD=true)
+#
+# GO_TAGS=stablediffusion

 ## Path where to store generated images
 # LOCALAI_IMAGE_PATH=/tmp/generated/images
@@ -61,26 +71,9 @@
 ### Define the number of parallel LLAMA.cpp workers (Defaults to 1)
 # LLAMACPP_PARALLEL=1

-### Define a list of GRPC Servers for llama-cpp workers to distribute the load
-# https://github.com/ggerganov/llama.cpp/pull/6829
-# https://github.com/ggerganov/llama.cpp/blob/master/tools/rpc/README.md
-# LLAMACPP_GRPC_SERVERS=""
-
 ### Enable to run parallel requests
 # LOCALAI_PARALLEL_REQUESTS=true

-# Enable to allow p2p mode
-# LOCALAI_P2P=true
-
-# Enable to use federated mode
-# LOCALAI_FEDERATED=true
-
-# Enable to start federation server
-# FEDERATED_SERVER=true
-
-# Define to use federation token
-# TOKEN=""
-
 ### Watchdog settings
 ###
 # Enables watchdog to kill backends that are inactive for too much time
@@ -93,4 +86,4 @@
 # LOCALAI_WATCHDOG_BUSY=true
 #
 # Time in duration format (e.g. 1h30m) after which a backend is considered busy
-# LOCALAI_WATCHDOG_BUSY_TIMEOUT=5m
+# LOCALAI_WATCHDOG_BUSY_TIMEOUT=5m
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,2 +1 @@
 *.sh text eol=lf
-backend/cpp/llama/*.hpp linguist-vendored
--- a/.github/bump_deps.sh
+++ b/.github/bump_deps.sh
@@ -3,25 +3,7 @@ set -xe
 REPO=$1
 BRANCH=$2
 VAR=$3
-FILE=$4
-
-if [ -z "$FILE" ]; then
-    FILE="Makefile"
-fi

 LAST_COMMIT=$(curl -s -H "Accept: application/vnd.github.VERSION.sha" "https://api.github.com/repos/$REPO/commits/$BRANCH")

-# Read $VAR from Makefile (only first match)
-set +e
-CURRENT_COMMIT="$(grep -m1 "^$VAR?=" $FILE | cut -d'=' -f2)"
-set -e
-
-sed -i $FILE -e "s/$VAR?=.*/$VAR?=$LAST_COMMIT/"
-
-if [ -z "$CURRENT_COMMIT" ]; then
-    echo "Could not find $VAR in Makefile."
-    exit 0
-fi
-
-echo "Changes: https://github.com/$REPO/compare/${CURRENT_COMMIT}..${LAST_COMMIT}" >> "${VAR}_message.txt"
-echo "${LAST_COMMIT}" >> "${VAR}_commit.txt"
+sed -i Makefile -e "s/$VAR?=.*/$VAR?=$LAST_COMMIT/"
--- a/.github/bump_docs.sh
+++ b/.github/bump_docs.sh
@@ -2,6 +2,6 @@
 set -xe
 REPO=$1

-LATEST_TAG=$(curl -s "https://api.github.com/repos/$REPO/releases/latest" | jq -r '.tag_name')
+LATEST_TAG=$(curl -s "https://api.github.com/repos/$REPO/releases/latest" | jq -r '.name')

 cat <<< $(jq ".version = \"$LATEST_TAG\"" docs/data/version.json) > docs/data/version.json
--- a/.github/check_and_update.py
+++ b/.github/check_and_update.py
@@ -1,85 +0,0 @@
-import hashlib
-from huggingface_hub import hf_hub_download, get_paths_info
-import requests
-import sys
-import os
-
-uri = sys.argv[1]
-file_name = uri.split('/')[-1]
-
-# Function to parse the URI and determine download method
-def parse_uri(uri):
-    if uri.startswith('huggingface://'):
-        repo_id = uri.split('://')[1]
-        return 'huggingface', repo_id.rsplit('/', 1)[0]
-    elif 'huggingface.co' in uri:
-        parts = uri.split('/resolve/')
-        if len(parts) > 1:
-            repo_path = parts[0].split('https://huggingface.co/')[-1]
-            return 'huggingface', repo_path
-    return 'direct', uri
-
-def calculate_sha256(file_path):
-    sha256_hash = hashlib.sha256()
-    with open(file_path, 'rb') as f:
-        for byte_block in iter(lambda: f.read(4096), b''):
-            sha256_hash.update(byte_block)
-    return sha256_hash.hexdigest()
-
-def manual_safety_check_hf(repo_id):
-    scanResponse = requests.get('https://huggingface.co/api/models/' + repo_id + "/scan")
-    scan = scanResponse.json()
-    # Check if 'hasUnsafeFile' exists in the response
-    if 'hasUnsafeFile' in scan:
-        if scan['hasUnsafeFile']:
-            return scan
-        else:
-            return None
-    else:
-        return None
-
-download_type, repo_id_or_url = parse_uri(uri)
-
-new_checksum =  None
-file_path = None
-
-# Decide download method based on URI type
-if download_type == 'huggingface':
-    # Check if the repo is flagged as dangerous by HF
-    hazard = manual_safety_check_hf(repo_id_or_url)
-    if hazard != None:
-        print(f'Error: HuggingFace has detected security problems for {repo_id_or_url}: {str(hazard)}', filename=file_name)
-        sys.exit(5)
-    # Use HF API to pull sha
-    for file in get_paths_info(repo_id_or_url, [file_name], repo_type='model'):
-        try:
-            new_checksum = file.lfs.sha256
-            break
-        except Exception as e:
-            print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr)
-            sys.exit(2)
-    if new_checksum is None:
-        try:
-            file_path = hf_hub_download(repo_id=repo_id_or_url, filename=file_name)
-        except Exception as e:
-            print(f'Error from Hugging Face Hub: {str(e)}', file=sys.stderr)
-            sys.exit(2)
-else:
-    response = requests.get(repo_id_or_url)
-    if response.status_code == 200:
-        with open(file_name, 'wb') as f:
-            f.write(response.content)
-        file_path = file_name
-    elif response.status_code == 404:
-        print(f'File not found: {response.status_code}', file=sys.stderr)
-        sys.exit(2)
-    else:
-        print(f'Error downloading file: {response.status_code}', file=sys.stderr)
-        sys.exit(1)
-
-if new_checksum is None:
-    new_checksum = calculate_sha256(file_path)
-    print(new_checksum)
-    os.remove(file_path)
-else:
-    print(new_checksum)
--- a/.github/checksum_checker.sh
+++ b/.github/checksum_checker.sh
@@ -1,63 +0,0 @@
-#!/bin/bash
-# This scripts needs yq and huggingface_hub to be installed
-# to install hugingface_hub run pip install huggingface_hub
-
-# Path to the input YAML file
-input_yaml=$1
-
-# Function to download file and check checksum using Python
-function check_and_update_checksum() {
-    model_name="$1"
-    file_name="$2"
-    uri="$3"
-    old_checksum="$4"
-    idx="$5"
-
-    # Download the file and calculate new checksum using Python
-    new_checksum=$(python3 ./.github/check_and_update.py $uri)
-    result=$?
-
-    if [[ $result -eq 5 ]]; then
-        echo "Contaminated entry detected, deleting entry for $model_name..."
-        yq eval -i "del([$idx])" "$input_yaml"
-        return
-    fi
-
-    if [[ "$new_checksum" == "" ]]; then
-        echo "Error calculating checksum for $file_name. Skipping..."
-        return
-    fi
-
-    echo "Checksum for $file_name: $new_checksum"
-
-    # Compare and update the YAML file if checksums do not match
-    
-    if [[ $result -eq 2 ]]; then
-        echo "File not found, deleting entry for $file_name..."
-        # yq eval -i "del(.[$idx].files[] | select(.filename == \"$file_name\"))" "$input_yaml"
-    elif [[ "$old_checksum" != "$new_checksum" ]]; then
-        echo "Checksum mismatch for $file_name. Updating..."
-        yq eval -i "del(.[$idx].files[] | select(.filename == \"$file_name\").sha256)" "$input_yaml"
-        yq eval -i "(.[$idx].files[] | select(.filename == \"$file_name\")).sha256 = \"$new_checksum\"" "$input_yaml"
-    elif [[ $result -ne 0 ]]; then
-        echo "Error downloading file $file_name. Skipping..."
-    else
-        echo "Checksum match for $file_name. No update needed."
-    fi
-}
-
-# Read the YAML and process each file
-len=$(yq eval '. | length' "$input_yaml")
-for ((i=0; i<$len; i++))
-do
-    name=$(yq eval ".[$i].name" "$input_yaml")
-    files_len=$(yq eval ".[$i].files | length" "$input_yaml")
-    for ((j=0; j<$files_len; j++))
-    do
-        filename=$(yq eval ".[$i].files[$j].filename" "$input_yaml")
-        uri=$(yq eval ".[$i].files[$j].uri" "$input_yaml")
-        checksum=$(yq eval ".[$i].files[$j].sha256" "$input_yaml")
-        echo "Checking model $name, file $filename. URI = $uri, Checksum = $checksum"
-        check_and_update_checksum "$name" "$filename" "$uri" "$checksum" "$i"
-    done
-done
--- a/.github/ci/modelslist.go
+++ b/.github/ci/modelslist.go
@@ -1,304 +0,0 @@
-package main
-
-import (
-	"fmt"
-	"html/template"
-	"io/ioutil"
-	"os"
-
-	"github.com/microcosm-cc/bluemonday"
-	"gopkg.in/yaml.v3"
-)
-
-var modelPageTemplate string = `
-<!DOCTYPE html>
-<html>
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>LocalAI models</title>
-    <link href="https://cdnjs.cloudflare.com/ajax/libs/flowbite/2.3.0/flowbite.min.css" rel="stylesheet" />
-    <script src="https://cdn.jsdelivr.net/npm/vanilla-lazyload@19.1.3/dist/lazyload.min.js"></script>
-
-    <link
-    rel="stylesheet"
-    href="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/styles/default.min.css"
-  />
-    <script
-    defer
-    src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.8.0/build/highlight.min.js"
-  ></script>
-    <script
-    defer
-    src="https://cdn.jsdelivr.net/npm/alpinejs@3.x.x/dist/cdn.min.js"
-  ></script>
-  <script
-    defer
-    src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"
-  ></script>
-  <script
-    defer
-    src="https://cdn.jsdelivr.net/npm/dompurify@3.0.6/dist/purify.min.js"
-  ></script>
-
-  <link href="/static/general.css" rel="stylesheet" />
-    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;600;700&family=Roboto:wght@400;500&display=swap" rel="stylesheet">
-    <link
-    href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700,900&display=swap"
-    rel="stylesheet" />
-  <link
-    rel="stylesheet"
-    href="https://cdn.jsdelivr.net/npm/tw-elements/css/tw-elements.min.css" />
-  <script src="https://cdn.tailwindcss.com/3.3.0"></script>
-  <script>
-    tailwind.config = {
-      darkMode: "class",
-      theme: {
-        fontFamily: {
-          sans: ["Roboto", "sans-serif"],
-          body: ["Roboto", "sans-serif"],
-          mono: ["ui-monospace", "monospace"],
-        },
-      },
-      corePlugins: {
-        preflight: false,
-      },
-    };
-  </script>
-    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.1/css/all.min.css">
-    <script src="https://unpkg.com/htmx.org@1.9.12" integrity="sha384-ujb1lZYygJmzgSwoxRggbCHcjc0rB2XoQrxeTUQyRjrOnlCoYta87iKBWq3EsdM2" crossorigin="anonymous"></script>
-</head>
-
-<body class="bg-gray-900 text-gray-200">
-<div class="flex flex-col min-h-screen">
-
-<nav class="bg-gray-800 shadow-lg">
-    <div class="container mx-auto px-4 py-4">
-        <div class="flex items-center justify-between">
-            <div class="flex items-center">
-                <a href="/" class="text-white text-xl font-bold"><img src="https://github.com/mudler/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd" alt="LocalAI Logo" class="h-10 mr-3 border-2 border-gray-300 shadow rounded"></a>
-                <a href="/" class="text-white text-xl font-bold">LocalAI</a>
-            </div>
-            <!-- Menu button for small screens -->
-            <div class="lg:hidden">
-                <button id="menu-toggle" class="text-gray-400 hover:text-white focus:outline-none">
-                    <i class="fas fa-bars fa-lg"></i>
-                </button>
-            </div>
-            <!-- Navigation links -->
-            <div class="hidden lg:flex lg:items-center lg:justify-end lg:flex-1 lg:w-0">
-                <a href="https://localai.io" class="text-gray-400 hover:text-white px-3 py-2 rounded" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
-            </div>
-        </div>
-        <!-- Collapsible menu for small screens -->
-        <div class="hidden lg:hidden" id="mobile-menu">
-            <div class="pt-4 pb-3 border-t border-gray-700">
-
-                <a href="https://localai.io" class="block text-gray-400 hover:text-white px-3 py-2 rounded mt-1" target="_blank" ><i class="fas fa-book-reader pr-2"></i> Documentation</a>
-
-            </div>
-        </div>
-    </div>
-</nav>
-
-<style>
-  .is-hidden {
-	display: none;
-	  }
-</style>
-
-<div class="container mx-auto px-4 flex-grow">
-
-<div class="models mt-12">
-	<h2 class="text-center text-3xl font-semibold text-gray-100">
-	LocalAI model gallery list </h2><br>
-
-	<h2 class="text-center text-3xl font-semibold text-gray-100">
-
-	 🖼️ Available {{.AvailableModels}} models</i> <a href="https://localai.io/models/" target="_blank" >
-			<i class="fas fa-circle-info pr-2"></i>
-		</a></h2>
-
-	<h3>
-	Refer to the Model gallery <a href="https://localai.io/models/" target="_blank" ><i class="fas fa-circle-info pr-2"></i></a> for more information on how to use the models with LocalAI.<br>
-
-	You can install models with the CLI command <code>local-ai models install <model-name></code>. or by using the WebUI.
-	</h3>
-
-	<input class="form-control appearance-none block w-full mt-5 px-3 py-2 text-base font-normal text-gray-300 pb-2 mb-5 bg-gray-800 bg-clip-padding border border-solid border-gray-600 rounded transition ease-in-out m-0 focus:text-gray-300 focus:bg-gray-900 focus:border-blue-500 focus:outline-none" type="search"
-	id="searchbox" placeholder="Live search keyword..">
-	  <div class="dark grid grid-cols-1 grid-rows-1 md:grid-cols-3 block rounded-lg shadow-secondary-1 dark:bg-surface-dark">
-		{{ range $_, $model := .Models }}
-		<div class="box me-4 mb-2 block rounded-lg bg-white shadow-secondary-1  dark:bg-gray-800 dark:bg-surface-dark dark:text-white text-surface pb-2">
-		<div>
-		    {{ $icon := "https://upload.wikimedia.org/wikipedia/commons/6/65/No-Image-Placeholder.svg" }}
-			{{ if $model.Icon }}
-	  		{{ $icon = $model.Icon }}
-	  		{{ end }}
-			<div class="flex justify-center items-center">
-				<img data-src="{{ $icon }}" alt="{{$model.Name}}" class="rounded-t-lg max-h-48 max-w-96 object-cover mt-3 lazy">
-			</div>
-	  		<div class="p-6 text-surface dark:text-white">
-				<h5 class="mb-2 text-xl font-medium leading-tight">{{$model.Name}}</h5>
-
-
-				<p class="mb-4 text-base truncate">{{ $model.Description }}</p>
-
-			</div>
-			<div class="px-6 pt-4 pb-2">
-
-      <!-- Modal toggle -->
-      <button data-modal-target="{{ $model.Name}}-modal" data-modal-toggle="{{ $model.Name }}-modal" class="block text-white bg-blue-700 hover:bg-blue-800 focus:ring-4 focus:outline-none focus:ring-blue-300 font-medium rounded-lg text-sm px-5 py-2.5 text-center dark:bg-blue-600 dark:hover:bg-blue-700 dark:focus:ring-blue-800" type="button">
-        More info
-      </button>
-
-    <!-- Main modal -->
-    <div id="{{ $model.Name}}-modal" tabindex="-1" aria-hidden="true" class="hidden overflow-y-auto overflow-x-hidden fixed top-0 right-0 left-0 z-50 justify-center items-center w-full md:inset-0 h-[calc(100%-1rem)] max-h-full">
-        <div class="relative p-4 w-full max-w-2xl max-h-full">
-            <!-- Modal content -->
-            <div class="relative bg-white rounded-lg shadow dark:bg-gray-700">
-                <!-- Modal header -->
-                <div class="flex items-center justify-between p-4 md:p-5 border-b rounded-t dark:border-gray-600">
-                    <h3 class="text-xl font-semibold text-gray-900 dark:text-white">
-                        {{ $model.Name}}
-                    </h3>
-                    <button type="button" class="text-gray-400 bg-transparent hover:bg-gray-200 hover:text-gray-900 rounded-lg text-sm w-8 h-8 ms-auto inline-flex justify-center items-center dark:hover:bg-gray-600 dark:hover:text-white" data-modal-hide="{{$model.Name}}-modal">
-                        <svg class="w-3 h-3" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 14 14">
-                            <path stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="m1 1 6 6m0 0 6 6M7 7l6-6M7 7l-6 6"/>
-                        </svg>
-                        <span class="sr-only">Close modal</span>
-                    </button>
-                </div>
-                <!-- Modal body -->
-                <div class="p-4 md:p-5 space-y-4">
-                    <div class="flex justify-center items-center">
-                    <img data-src="{{ $icon }}" alt="{{$model.Name}}" class="lazy rounded-t-lg max-h-48 max-w-96 object-cover mt-3">
-                  </div>
-
-                    <p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
-                    {{ $model.Description }}
-
-                    </p>
-
-                    <p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
-                    To install the model with the CLI, run: <br>
-                    <code> local-ai models install {{$model.Name}} </code> <br>
-
-                    <hr>
-                    See also <a href="https://localai.io/models/" target="_blank" >
-                    Installation <i class="fas fa-circle-info pr-2"></i>
-                    </a> to see how to install models with the REST API.
-                    </p>
-
-                    <p class="text-base leading-relaxed text-gray-500 dark:text-gray-400">
-                    <ul>
-                    {{ range $_, $u := $model.URLs }}
-                    <li><a href="{{ $u }}" target=_blank><i class="fa-solid fa-link"></i> {{ $u }}</a></li>
-                    {{ end }}
-                    </ul>
-                    </p>
-                </div>
-                <!-- Modal footer -->
-                <div class="flex items-center p-4 md:p-5 border-t border-gray-200 rounded-b dark:border-gray-600">
-                    <button data-modal-hide="{{ $model.Name}}-modal" type="button" class="py-2.5 px-5 ms-3 text-sm font-medium text-gray-900 focus:outline-none bg-white rounded-lg border border-gray-200 hover:bg-gray-100 hover:text-blue-700 focus:z-10 focus:ring-4 focus:ring-gray-100 dark:focus:ring-gray-700 dark:bg-gray-800 dark:text-gray-400 dark:border-gray-600 dark:hover:text-white dark:hover:bg-gray-700">Close</button>
-                </div>
-            </div>
-        </div>
-    </div>
-
-
-			</div>
-		</div>
-		</div>
-		{{ end }}
-
-		</div>
-  </div>
-</div>
-
-<script>
-var lazyLoadInstance = new LazyLoad({
-  // Your custom settings go here
-});
-
-let cards = document.querySelectorAll('.box')
-
-function liveSearch() {
-    let search_query = document.getElementById("searchbox").value;
-
-    //Use innerText if all contents are visible
-    //Use textContent for including hidden elements
-    for (var i = 0; i < cards.length; i++) {
-        if(cards[i].textContent.toLowerCase()
-                .includes(search_query.toLowerCase())) {
-            cards[i].classList.remove("is-hidden");
-        } else {
-            cards[i].classList.add("is-hidden");
-        }
-    }
-}
-
-//A little delay
-let typingTimer;
-let typeInterval = 500;
-let searchInput = document.getElementById('searchbox');
-
-searchInput.addEventListener('keyup', () => {
-    clearTimeout(typingTimer);
-    typingTimer = setTimeout(liveSearch, typeInterval);
-});
-</script>
-
-</div>
-
-<script src="https://cdnjs.cloudflare.com/ajax/libs/flowbite/2.3.0/flowbite.min.js"></script>
-</body>
-</html>
-`
-
-type GalleryModel struct {
-	Name        string   `json:"name" yaml:"name"`
-	URLs        []string `json:"urls" yaml:"urls"`
-	Icon        string   `json:"icon" yaml:"icon"`
-	Description string   `json:"description" yaml:"description"`
-}
-
-func main() {
-	// read the YAML file which contains the models
-
-	f, err := ioutil.ReadFile(os.Args[1])
-	if err != nil {
-		fmt.Println("Error reading file:", err)
-		return
-	}
-
-	models := []*GalleryModel{}
-	err = yaml.Unmarshal(f, &models)
-	if err != nil {
-		// write to stderr
-		os.Stderr.WriteString("Error unmarshaling YAML: " + err.Error() + "\n")
-		return
-	}
-
-	// Ensure that all arbitrary text content is sanitized before display
-	for i, m := range models {
-		models[i].Name = bluemonday.StrictPolicy().Sanitize(m.Name)
-		models[i].Description = bluemonday.StrictPolicy().Sanitize(m.Description)
-	}
-
-	// render the template
-	data := struct {
-		Models          []*GalleryModel
-		AvailableModels int
-	}{
-		Models:          models,
-		AvailableModels: len(models),
-	}
-	tmpl := template.Must(template.New("modelPage").Parse(modelPageTemplate))
-
-	err = tmpl.Execute(os.Stdout, data)
-	if err != nil {
-		fmt.Println("Error executing template:", err)
-		return
-	}
-}
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -1,16 +1,10 @@
 # https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
 version: 2
 updates:
-  - package-ecosystem: "gitsubmodule"
-    directory: "/"
-    schedule:
-      interval: "weekly"
  - package-ecosystem: "gomod"
    directory: "/"
    schedule:
      interval: "weekly"
-    ignore:
-    - dependency-name: "github.com/mudler/LocalAI/pkg/grpc/proto"
  - package-ecosystem: "github-actions"
    # Workflow files stored in the default location of `.github/workflows`. (You don't need to specify `/.github/workflows` for `directory`. You can use `directory: "/"`.)
    directory: "/"
@@ -29,91 +23,3 @@ updates:
    schedule:
      # Check for updates to GitHub Actions every weekday
      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/bark"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/common/template"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/coqui"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/diffusers"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/exllama"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/exllama2"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/mamba"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/openvoice"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/rerankers"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/sentencetransformers"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/transformers"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/backend/python/vllm"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/examples/chainlit"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/examples/functions"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/examples/langchain/langchainpy-localai-example"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/examples/langchain-chroma"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "pip"
-    directory: "/examples/streamlit-bot"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "docker"
-    directory: "/examples/k8sgpt"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "docker"
-    directory: "/examples/kubernetes"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "docker"
-    directory: "/examples/langchain"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "gomod"
-    directory: "/examples/semantic-todo"
-    schedule:
-      interval: "weekly"
-  - package-ecosystem: "docker"
-    directory: "/examples/telegram-bot"
-    schedule:
-      interval: "weekly"
--- a/.github/gallery-agent/agent.go
+++ b/.github/gallery-agent/agent.go
@@ -1,446 +0,0 @@
-package main
-
-import (
-	"context"
-	"encoding/json"
-	"fmt"
-	"io"
-	"net/http"
-	"os"
-	"regexp"
-	"slices"
-	"strings"
-
-	"github.com/ghodss/yaml"
-	hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
-	"github.com/mudler/cogito"
-	"github.com/mudler/cogito/clients"
-	"github.com/mudler/cogito/structures"
-	"github.com/sashabaranov/go-openai/jsonschema"
-)
-
-var (
-	openAIModel      = os.Getenv("OPENAI_MODEL")
-	openAIKey        = os.Getenv("OPENAI_KEY")
-	openAIBaseURL    = os.Getenv("OPENAI_BASE_URL")
-	galleryIndexPath = os.Getenv("GALLERY_INDEX_PATH")
-	//defaultclient
-	llm = clients.NewOpenAILLM(openAIModel, openAIKey, openAIBaseURL)
-)
-
-// cleanTextContent removes trailing spaces, tabs, and normalizes line endings
-// to prevent YAML linting issues like trailing spaces and multiple empty lines
-func cleanTextContent(text string) string {
-	lines := strings.Split(text, "\n")
-	var cleanedLines []string
-	var prevEmpty bool
-	for _, line := range lines {
-		// Remove all trailing whitespace (spaces, tabs, etc.)
-		trimmed := strings.TrimRight(line, " \t\r")
-		// Avoid multiple consecutive empty lines
-		if trimmed == "" {
-			if !prevEmpty {
-				cleanedLines = append(cleanedLines, "")
-			}
-			prevEmpty = true
-		} else {
-			cleanedLines = append(cleanedLines, trimmed)
-			prevEmpty = false
-		}
-	}
-	// Remove trailing empty lines from the result
-	result := strings.Join(cleanedLines, "\n")
-	return stripThinkingTags(strings.TrimRight(result, "\n"))
-}
-
-type galleryModel struct {
-	Name string   `yaml:"name"`
-	Urls []string `yaml:"urls"`
-}
-
-// isModelExisting checks if a specific model ID exists in the gallery using text search
-func isModelExisting(modelID string) (bool, error) {
-	indexPath := getGalleryIndexPath()
-	content, err := os.ReadFile(indexPath)
-	if err != nil {
-		return false, fmt.Errorf("failed to read %s: %w", indexPath, err)
-	}
-
-	var galleryModels []galleryModel
-
-	err = yaml.Unmarshal(content, &galleryModels)
-	if err != nil {
-		return false, fmt.Errorf("failed to unmarshal %s: %w", indexPath, err)
-	}
-
-	for _, galleryModel := range galleryModels {
-		if slices.Contains(galleryModel.Urls, modelID) {
-			return true, nil
-		}
-	}
-
-	return false, nil
-}
-
-// filterExistingModels removes models that already exist in the gallery
-func filterExistingModels(models []ProcessedModel) ([]ProcessedModel, error) {
-	var filteredModels []ProcessedModel
-	for _, model := range models {
-		exists, err := isModelExisting(model.ModelID)
-		if err != nil {
-			fmt.Printf("Error checking if model %s exists: %v, skipping\n", model.ModelID, err)
-			continue
-		}
-
-		if !exists {
-			filteredModels = append(filteredModels, model)
-		} else {
-			fmt.Printf("Skipping existing model: %s\n", model.ModelID)
-		}
-	}
-
-	fmt.Printf("Filtered out %d existing models, %d new models remaining\n",
-		len(models)-len(filteredModels), len(filteredModels))
-
-	return filteredModels, nil
-}
-
-// getGalleryIndexPath returns the gallery index file path, with a default fallback
-func getGalleryIndexPath() string {
-	if galleryIndexPath != "" {
-		return galleryIndexPath
-	}
-	return "gallery/index.yaml"
-}
-
-func stripThinkingTags(content string) string {
-	// Remove content between <thinking> and </thinking> (including multi-line)
-	content = regexp.MustCompile(`(?s)<thinking>.*?</thinking>`).ReplaceAllString(content, "")
-	// Remove content between <think> and </think> (including multi-line)
-	content = regexp.MustCompile(`(?s)<think>.*?</think>`).ReplaceAllString(content, "")
-	// Clean up any extra whitespace
-	content = strings.TrimSpace(content)
-	return content
-}
-
-func getRealReadme(ctx context.Context, repository string) (string, error) {
-	// Create a conversation fragment
-	fragment := cogito.NewEmptyFragment().
-		AddMessage("user",
-			`Your task is to get a clear description of a large language model from huggingface by using the provided tool. I will share with you a repository that might be quantized, and as such probably not by the original model author. We need to get the real  description of the model, and not the one that might be quantized. You will have to call the tool to get the readme more than once by figuring out from the quantized readme which is the base model readme. This is the repository: `+repository)
-
-	// Execute with tools
-	result, err := cogito.ExecuteTools(llm, fragment,
-		cogito.WithIterations(3),
-		cogito.WithMaxAttempts(3),
-		cogito.DisableSinkState,
-		cogito.WithTools(&HFReadmeTool{client: hfapi.NewClient()}))
-	if err != nil {
-		return "", err
-	}
-
-	result = result.AddMessage("user", "Describe the model in a clear and concise way that can be shared in a model gallery.")
-
-	// Get a response
-	_, err = llm.Ask(ctx, result)
-	if err != nil {
-		return "", err
-	}
-
-	content := result.LastMessage().Content
-	return cleanTextContent(content), nil
-}
-
-func selectMostInterestingModels(ctx context.Context, searchResult *SearchResult) ([]ProcessedModel, error) {
-
-	if len(searchResult.Models) == 1 {
-		return searchResult.Models, nil
-	}
-
-	// Create a conversation fragment
-	fragment := cogito.NewEmptyFragment().
-		AddMessage("user",
-			`Your task is to analyze a list of AI models and select the most interesting ones for a model gallery. You will be given detailed information about multiple models including their metadata, file information, and README content.
-
-Consider the following criteria when selecting models:
-1. Model popularity (download count)
-2. Model recency (last modified date)
-3. Model completeness (has preferred model file, README, etc.)
-4. Model uniqueness (not duplicates or very similar models)
-5. Model quality (based on README content and description)
-6. Model utility (practical applications)
-
-You should select models that would be most valuable for users browsing a model gallery. Prioritize models that are:
- Well-documented with clear READMEs
- Recently updated
- Popular (high download count)
- Have the preferred quantization format available
- Offer unique capabilities or are from reputable authors
-
-Return your analysis and selection reasoning.`)
-
-	// Add the search results as context
-	modelsInfo := fmt.Sprintf("Found %d models matching '%s' with quantization preference '%s':\n\n",
-		searchResult.TotalModelsFound, searchResult.SearchTerm, searchResult.Quantization)
-
-	for i, model := range searchResult.Models {
-		modelsInfo += fmt.Sprintf("Model %d:\n", i+1)
-		modelsInfo += fmt.Sprintf("  ID: %s\n", model.ModelID)
-		modelsInfo += fmt.Sprintf("  Author: %s\n", model.Author)
-		modelsInfo += fmt.Sprintf("  Downloads: %d\n", model.Downloads)
-		modelsInfo += fmt.Sprintf("  Last Modified: %s\n", model.LastModified)
-		modelsInfo += fmt.Sprintf("  Files: %d files\n", len(model.Files))
-
-		if model.PreferredModelFile != nil {
-			modelsInfo += fmt.Sprintf("  Preferred Model File: %s (%d bytes)\n",
-				model.PreferredModelFile.Path, model.PreferredModelFile.Size)
-		} else {
-			modelsInfo += "  No preferred model file found\n"
-		}
-
-		if model.ReadmeContent != "" {
-			modelsInfo += fmt.Sprintf("  README: %s\n", model.ReadmeContent)
-		}
-
-		if model.ProcessingError != "" {
-			modelsInfo += fmt.Sprintf("  Processing Error: %s\n", model.ProcessingError)
-		}
-
-		modelsInfo += "\n"
-	}
-
-	fragment = fragment.AddMessage("user", modelsInfo)
-
-	fragment = fragment.AddMessage("user", "Based on your analysis, select the top 5 most interesting models and provide a brief explanation for each selection. Also, create a filtered SearchResult with only the selected models. Return just a list of repositories IDs, you will later be asked to output it as a JSON array with the json tool.")
-
-	// Get a response
-	newFragment, err := llm.Ask(ctx, fragment)
-	if err != nil {
-		return nil, err
-	}
-
-	fmt.Println(newFragment.LastMessage().Content)
-	repositories := struct {
-		Repositories []string `json:"repositories"`
-	}{}
-
-	s := structures.Structure{
-		Schema: jsonschema.Definition{
-			Type:                 jsonschema.Object,
-			AdditionalProperties: false,
-			Properties: map[string]jsonschema.Definition{
-				"repositories": {
-					Type:        jsonschema.Array,
-					Items:       &jsonschema.Definition{Type: jsonschema.String},
-					Description: "The trending repositories IDs",
-				},
-			},
-			Required: []string{"repositories"},
-		},
-		Object: &repositories,
-	}
-
-	err = newFragment.ExtractStructure(ctx, llm, s)
-	if err != nil {
-		return nil, err
-	}
-
-	filteredModels := []ProcessedModel{}
-	for _, m := range searchResult.Models {
-		if slices.Contains(repositories.Repositories, m.ModelID) {
-			filteredModels = append(filteredModels, m)
-		}
-	}
-
-	return filteredModels, nil
-}
-
-// ModelMetadata represents extracted metadata from a model
-type ModelMetadata struct {
-	Tags    []string `json:"tags"`
-	License string   `json:"license"`
-}
-
-// extractModelMetadata extracts tags and license from model README and documentation
-func extractModelMetadata(ctx context.Context, model ProcessedModel) ([]string, string, error) {
-	// Create a conversation fragment
-	fragment := cogito.NewEmptyFragment().
-		AddMessage("user",
-			`Your task is to extract metadata from an AI model's README and documentation. You will be provided with:
-1. Model information (ID, author, description)
-2. README content
-
-You need to extract:
-1. **Tags**: An array of relevant tags that describe the model. Use common tags from the gallery such as:
-   - llm, gguf, gpu, cpu, multimodal, image-to-text, text-to-text, text-to-speech, tts
-   - thinking, reasoning, chat, instruction-tuned, code, vision
-   - Model family names (e.g., llama, qwen, mistral, gemma) if applicable
-   - Any other relevant descriptive tags
-   Select 3-8 most relevant tags.
-
-2. **License**: The license identifier (e.g., "apache-2.0", "mit", "llama2", "gpl-3.0", "bsd", "cc-by-4.0").
-   If no license is found, return an empty string.
-
-Return the extracted metadata in a structured format.`)
-
-	// Add model information
-	modelInfo := "Model Information:\n"
-	modelInfo += fmt.Sprintf("  ID: %s\n", model.ModelID)
-	modelInfo += fmt.Sprintf("  Author: %s\n", model.Author)
-	modelInfo += fmt.Sprintf("  Downloads: %d\n", model.Downloads)
-	if model.ReadmeContent != "" {
-		modelInfo += fmt.Sprintf("  README Content:\n%s\n", model.ReadmeContent)
-	} else if model.ReadmeContentPreview != "" {
-		modelInfo += fmt.Sprintf("  README Preview: %s\n", model.ReadmeContentPreview)
-	}
-
-	fragment = fragment.AddMessage("user", modelInfo)
-	fragment = fragment.AddMessage("user", "Extract the tags and license from the model information. Return the metadata as a JSON object with 'tags' (array of strings) and 'license' (string).")
-
-	// Get a response
-	newFragment, err := llm.Ask(ctx, fragment)
-	if err != nil {
-		return nil, "", err
-	}
-
-	// Extract structured metadata
-	metadata := ModelMetadata{}
-
-	s := structures.Structure{
-		Schema: jsonschema.Definition{
-			Type:                 jsonschema.Object,
-			AdditionalProperties: false,
-			Properties: map[string]jsonschema.Definition{
-				"tags": {
-					Type:        jsonschema.Array,
-					Items:       &jsonschema.Definition{Type: jsonschema.String},
-					Description: "Array of relevant tags describing the model",
-				},
-				"license": {
-					Type:        jsonschema.String,
-					Description: "License identifier (e.g., apache-2.0, mit, llama2). Empty string if not found.",
-				},
-			},
-			Required: []string{"tags", "license"},
-		},
-		Object: &metadata,
-	}
-
-	err = newFragment.ExtractStructure(ctx, llm, s)
-	if err != nil {
-		return nil, "", err
-	}
-
-	return metadata.Tags, metadata.License, nil
-}
-
-// extractIconFromReadme scans the README content for image URLs and returns the first suitable icon URL found
-func extractIconFromReadme(readmeContent string) string {
-	if readmeContent == "" {
-		return ""
-	}
-
-	// Regular expressions to match image URLs in various formats (case-insensitive)
-	// Match markdown image syntax: ![alt](url) - case insensitive extensions
-	markdownImageRegex := regexp.MustCompile(`(?i)!\[[^\]]*\]\(([^)]+\.(png|jpg|jpeg|svg|webp|gif))\)`)
-	// Match HTML img tags: <img src="url">
-	htmlImageRegex := regexp.MustCompile(`(?i)<img[^>]+src=["']([^"']+\.(png|jpg|jpeg|svg|webp|gif))["']`)
-	// Match plain URLs ending with image extensions
-	plainImageRegex := regexp.MustCompile(`(?i)https?://[^\s<>"']+\.(png|jpg|jpeg|svg|webp|gif)`)
-
-	// Try markdown format first
-	matches := markdownImageRegex.FindStringSubmatch(readmeContent)
-	if len(matches) > 1 && matches[1] != "" {
-		url := strings.TrimSpace(matches[1])
-		// Prefer HuggingFace CDN URLs or absolute URLs
-		if strings.HasPrefix(strings.ToLower(url), "http") {
-			return url
-		}
-	}
-
-	// Try HTML img tags
-	matches = htmlImageRegex.FindStringSubmatch(readmeContent)
-	if len(matches) > 1 && matches[1] != "" {
-		url := strings.TrimSpace(matches[1])
-		if strings.HasPrefix(strings.ToLower(url), "http") {
-			return url
-		}
-	}
-
-	// Try plain URLs
-	matches = plainImageRegex.FindStringSubmatch(readmeContent)
-	if len(matches) > 0 {
-		url := strings.TrimSpace(matches[0])
-		if strings.HasPrefix(strings.ToLower(url), "http") {
-			return url
-		}
-	}
-
-	return ""
-}
-
-// getHuggingFaceAvatarURL attempts to get the HuggingFace avatar URL for a user
-func getHuggingFaceAvatarURL(author string) string {
-	if author == "" {
-		return ""
-	}
-
-	// Try to fetch user info from HuggingFace API
-	// HuggingFace API endpoint: https://huggingface.co/api/users/{username}
-	baseURL := "https://huggingface.co"
-	userURL := fmt.Sprintf("%s/api/users/%s", baseURL, author)
-
-	req, err := http.NewRequest("GET", userURL, nil)
-	if err != nil {
-		return ""
-	}
-
-	client := &http.Client{}
-	resp, err := client.Do(req)
-	if err != nil {
-		return ""
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK {
-		return ""
-	}
-
-	// Parse the response to get avatar URL
-	var userInfo map[string]any
-	body, err := io.ReadAll(resp.Body)
-	if err != nil {
-		return ""
-	}
-
-	if err := json.Unmarshal(body, &userInfo); err != nil {
-		return ""
-	}
-
-	// Try to extract avatar URL from response
-	if avatar, ok := userInfo["avatarUrl"].(string); ok && avatar != "" {
-		return avatar
-	}
-	if avatar, ok := userInfo["avatar"].(string); ok && avatar != "" {
-		return avatar
-	}
-
-	return ""
-}
-
-// extractModelIcon extracts icon URL from README or falls back to HuggingFace avatar
-func extractModelIcon(model ProcessedModel) string {
-	// First, try to extract icon from README
-	if icon := extractIconFromReadme(model.ReadmeContent); icon != "" {
-		return icon
-	}
-
-	// Fallback: Try to get HuggingFace user avatar
-	if model.Author != "" {
-		if avatar := getHuggingFaceAvatarURL(model.Author); avatar != "" {
-			return avatar
-		}
-	}
-
-	return ""
-}
--- a/.github/gallery-agent/gallery.go
+++ b/.github/gallery-agent/gallery.go
@@ -1,213 +0,0 @@
-package main
-
-import (
-	"context"
-	"encoding/json"
-	"fmt"
-	"os"
-	"strings"
-
-	"github.com/ghodss/yaml"
-	"github.com/mudler/LocalAI/core/gallery/importers"
-)
-
-func formatTextContent(text string) string {
-	return formatTextContentWithIndent(text, 4, 6)
-}
-
-// formatTextContentWithIndent formats text content with specified base and list item indentation
-func formatTextContentWithIndent(text string, baseIndent int, listItemIndent int) string {
-	var formattedLines []string
-	lines := strings.Split(text, "\n")
-	for _, line := range lines {
-		trimmed := strings.TrimRight(line, " \t\r")
-		if trimmed == "" {
-			// Keep empty lines as empty (no indentation)
-			formattedLines = append(formattedLines, "")
-		} else {
-			// Preserve relative indentation from yaml.Marshal output
-			// Count existing leading spaces to preserve relative structure
-			leadingSpaces := len(trimmed) - len(strings.TrimLeft(trimmed, " \t"))
-			trimmedStripped := strings.TrimLeft(trimmed, " \t")
-
-			var totalIndent int
-			if strings.HasPrefix(trimmedStripped, "-") {
-				// List items: use listItemIndent (ignore existing leading spaces)
-				totalIndent = listItemIndent
-			} else {
-				// Regular lines: use baseIndent + preserve relative indentation
-				// This handles both top-level keys (leadingSpaces=0) and nested properties (leadingSpaces>0)
-				totalIndent = baseIndent + leadingSpaces
-			}
-
-			indentStr := strings.Repeat(" ", totalIndent)
-			formattedLines = append(formattedLines, indentStr+trimmedStripped)
-		}
-	}
-	formattedText := strings.Join(formattedLines, "\n")
-	// Remove any trailing spaces from the formatted description
-	formattedText = strings.TrimRight(formattedText, " \t")
-	return formattedText
-}
-
-// generateYAMLEntry generates a YAML entry for a model using the specified anchor
-func generateYAMLEntry(model ProcessedModel, quantization string) string {
-	modelConfig, err := importers.DiscoverModelConfig("https://huggingface.co/"+model.ModelID, json.RawMessage(`{ "quantization": "`+quantization+`"}`))
-	if err != nil {
-		panic(err)
-	}
-
-	// Extract model name from ModelID
-	parts := strings.Split(model.ModelID, "/")
-	modelName := model.ModelID
-	if len(parts) > 0 {
-		modelName = strings.ToLower(parts[len(parts)-1])
-	}
-	// Remove common suffixes
-	modelName = strings.ReplaceAll(modelName, "-gguf", "")
-	modelName = strings.ReplaceAll(modelName, "-q4_k_m", "")
-	modelName = strings.ReplaceAll(modelName, "-q4_k_s", "")
-	modelName = strings.ReplaceAll(modelName, "-q3_k_m", "")
-	modelName = strings.ReplaceAll(modelName, "-q2_k", "")
-
-	description := model.ReadmeContent
-	if description == "" {
-		description = fmt.Sprintf("AI model: %s", modelName)
-	}
-
-	// Clean up description to prevent YAML linting issues
-	description = cleanTextContent(description)
-	formattedDescription := formatTextContent(description)
-
-	// Strip name and description from config file since they are
-	// already present at the gallery entry level and should not
-	// appear under overrides.
-	configFileContent := modelConfig.ConfigFile
-	var cfgMap map[string]any
-	if err := yaml.Unmarshal([]byte(configFileContent), &cfgMap); err == nil {
-		delete(cfgMap, "name")
-		delete(cfgMap, "description")
-		if cleaned, err := yaml.Marshal(cfgMap); err == nil {
-			configFileContent = string(cleaned)
-		}
-	}
-
-	configFile := formatTextContent(configFileContent)
-
-	filesYAML, _ := yaml.Marshal(modelConfig.Files)
-
-	// Files section: list items need 4 spaces (not 6), since files: is at 2 spaces
-	files := formatTextContentWithIndent(string(filesYAML), 4, 4)
-
-	// Build metadata sections
-	var metadataSections []string
-
-	// Add license if present
-	if model.License != "" {
-		metadataSections = append(metadataSections, fmt.Sprintf(`  license: "%s"`, model.License))
-	}
-
-	// Add tags if present
-	if len(model.Tags) > 0 {
-		tagsYAML, _ := yaml.Marshal(model.Tags)
-		tagsFormatted := formatTextContentWithIndent(string(tagsYAML), 4, 4)
-		tagsFormatted = strings.TrimRight(tagsFormatted, "\n")
-		metadataSections = append(metadataSections, fmt.Sprintf("  tags:\n%s", tagsFormatted))
-	}
-
-	// Add icon if present
-	if model.Icon != "" {
-		metadataSections = append(metadataSections, fmt.Sprintf(`  icon: %s`, model.Icon))
-	}
-
-	// Build the metadata block
-	metadataBlock := ""
-	if len(metadataSections) > 0 {
-		metadataBlock = strings.Join(metadataSections, "\n") + "\n"
-	}
-
-	yamlTemplate := ""
-	yamlTemplate = `- name: "%s"
-  url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
-  urls:
-    - https://huggingface.co/%s
-  description: |
-%s%s
-  overrides:
-%s
-  files:
-%s`
-	// Trim trailing newlines from formatted sections to prevent extra blank lines
-	formattedDescription = strings.TrimRight(formattedDescription, "\n")
-	configFile = strings.TrimRight(configFile, "\n")
-	files = strings.TrimRight(files, "\n")
-	// Add newline before metadata block if present
-	if metadataBlock != "" {
-		metadataBlock = "\n" + strings.TrimRight(metadataBlock, "\n")
-	}
-	return fmt.Sprintf(yamlTemplate,
-		modelName,
-		model.ModelID,
-		formattedDescription,
-		metadataBlock,
-		configFile,
-		files,
-	)
-}
-
-// generateYAMLForModels generates YAML entries for selected models and appends to index.yaml
-func generateYAMLForModels(ctx context.Context, models []ProcessedModel, quantization string) error {
-
-	// Generate YAML entries for each model
-	var yamlEntries []string
-	for _, model := range models {
-		fmt.Printf("Generating YAML entry for model: %s\n", model.ModelID)
-
-		// Generate YAML entry
-		yamlEntry := generateYAMLEntry(model, quantization)
-		yamlEntries = append(yamlEntries, yamlEntry)
-	}
-
-	// Prepend to index.yaml (write at the top)
-	if len(yamlEntries) > 0 {
-		indexPath := getGalleryIndexPath()
-		fmt.Printf("Prepending YAML entries to %s...\n", indexPath)
-
-		// Read current content
-		content, err := os.ReadFile(indexPath)
-		if err != nil {
-			return fmt.Errorf("failed to read %s: %w", indexPath, err)
-		}
-
-		existingContent := string(content)
-		yamlBlock := strings.Join(yamlEntries, "\n")
-
-		// Check if file starts with "---"
-		var newContent string
-		if strings.HasPrefix(existingContent, "---\n") {
-			// File starts with "---", prepend new entries after it
-			restOfContent := strings.TrimPrefix(existingContent, "---\n")
-			// Ensure proper spacing: "---\n" + new entries + "\n" + rest of content
-			newContent = "---\n" + yamlBlock + "\n" + restOfContent
-		} else if strings.HasPrefix(existingContent, "---") {
-			// File starts with "---" but no newline after
-			restOfContent := strings.TrimPrefix(existingContent, "---")
-			newContent = "---\n" + yamlBlock + "\n" + strings.TrimPrefix(restOfContent, "\n")
-		} else {
-			// No "---" at start, prepend new entries at the very beginning
-			// Trim leading whitespace from existing content
-			existingContent = strings.TrimLeft(existingContent, " \t\n\r")
-			newContent = yamlBlock + "\n" + existingContent
-		}
-
-		// Write back to file
-		err = os.WriteFile(indexPath, []byte(newContent), 0644)
-		if err != nil {
-			return fmt.Errorf("failed to write %s: %w", indexPath, err)
-		}
-
-		fmt.Printf("Successfully prepended %d models to %s\n", len(yamlEntries), indexPath)
-	}
-
-	return nil
-}
--- a/.github/gallery-agent/main.go
+++ b/.github/gallery-agent/main.go
@@ -1,383 +0,0 @@
-package main
-
-import (
-	"context"
-	"encoding/json"
-	"fmt"
-	"os"
-	"strconv"
-	"strings"
-	"time"
-
-	hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
-)
-
-// ProcessedModelFile represents a processed model file with additional metadata
-type ProcessedModelFile struct {
-	Path     string `json:"path"`
-	Size     int64  `json:"size"`
-	SHA256   string `json:"sha256"`
-	IsReadme bool   `json:"is_readme"`
-	FileType string `json:"file_type"` // "model", "readme", "other"
-}
-
-// ProcessedModel represents a processed model with all gathered metadata
-type ProcessedModel struct {
-	ModelID                 string               `json:"model_id"`
-	Author                  string               `json:"author"`
-	Downloads               int                  `json:"downloads"`
-	LastModified            string               `json:"last_modified"`
-	Files                   []ProcessedModelFile `json:"files"`
-	PreferredModelFile      *ProcessedModelFile  `json:"preferred_model_file,omitempty"`
-	ReadmeFile              *ProcessedModelFile  `json:"readme_file,omitempty"`
-	ReadmeContent           string               `json:"readme_content,omitempty"`
-	ReadmeContentPreview    string               `json:"readme_content_preview,omitempty"`
-	QuantizationPreferences []string             `json:"quantization_preferences"`
-	ProcessingError         string               `json:"processing_error,omitempty"`
-	Tags                    []string             `json:"tags,omitempty"`
-	License                 string               `json:"license,omitempty"`
-	Icon                    string               `json:"icon,omitempty"`
-}
-
-// SearchResult represents the complete result of searching and processing models
-type SearchResult struct {
-	SearchTerm       string           `json:"search_term"`
-	Limit            int              `json:"limit"`
-	Quantization     string           `json:"quantization"`
-	TotalModelsFound int              `json:"total_models_found"`
-	Models           []ProcessedModel `json:"models"`
-	FormattedOutput  string           `json:"formatted_output"`
-}
-
-// AddedModelSummary represents a summary of models added to the gallery
-type AddedModelSummary struct {
-	SearchTerm     string   `json:"search_term"`
-	TotalFound     int      `json:"total_found"`
-	ModelsAdded    int      `json:"models_added"`
-	AddedModelIDs  []string `json:"added_model_ids"`
-	AddedModelURLs []string `json:"added_model_urls"`
-	Quantization   string   `json:"quantization"`
-	ProcessingTime string   `json:"processing_time"`
-}
-
-func main() {
-	startTime := time.Now()
-
-	// Check for synthetic mode
-	syntheticMode := os.Getenv("SYNTHETIC_MODE")
-	if syntheticMode == "true" || syntheticMode == "1" {
-		fmt.Println("Running in SYNTHETIC MODE - generating random test data")
-		err := runSyntheticMode()
-		if err != nil {
-			fmt.Fprintf(os.Stderr, "Error in synthetic mode: %v\n", err)
-			os.Exit(1)
-		}
-		return
-	}
-
-	// Get configuration from environment variables
-	searchTerm := os.Getenv("SEARCH_TERM")
-	if searchTerm == "" {
-		searchTerm = "GGUF"
-	}
-
-	limitStr := os.Getenv("LIMIT")
-	if limitStr == "" {
-		limitStr = "5"
-	}
-	limit, err := strconv.Atoi(limitStr)
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "Error parsing LIMIT: %v\n", err)
-		os.Exit(1)
-	}
-
-	quantization := os.Getenv("QUANTIZATION")
-
-	maxModels := os.Getenv("MAX_MODELS")
-	if maxModels == "" {
-		maxModels = "1"
-	}
-	maxModelsInt, err := strconv.Atoi(maxModels)
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "Error parsing MAX_MODELS: %v\n", err)
-		os.Exit(1)
-	}
-
-	// Print configuration
-	fmt.Printf("Gallery Agent Configuration:\n")
-	fmt.Printf("  Search Term: %s\n", searchTerm)
-	fmt.Printf("  Limit: %d\n", limit)
-	fmt.Printf("  Quantization: %s\n", quantization)
-	fmt.Printf("  Max Models to Add: %d\n", maxModelsInt)
-	fmt.Printf("  Gallery Index Path: %s\n", os.Getenv("GALLERY_INDEX_PATH"))
-	fmt.Println()
-
-	result, err := searchAndProcessModels(searchTerm, limit, quantization)
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "Error: %v\n", err)
-		os.Exit(1)
-	}
-
-	fmt.Println(result.FormattedOutput)
-	var models []ProcessedModel
-
-	if len(result.Models) > 1 {
-		fmt.Println("More than one model found (", len(result.Models), "), using AI agent to select the most interesting models")
-		for _, model := range result.Models {
-			fmt.Println("Model: ", model.ModelID)
-		}
-		// Use AI agent to select the most interesting models
-		fmt.Println("Using AI agent to select the most interesting models...")
-		models, err = selectMostInterestingModels(context.Background(), result)
-		if err != nil {
-			fmt.Fprintf(os.Stderr, "Error in model selection: %v\n", err)
-			// Continue with original result if selection fails
-			models = result.Models
-		}
-	} else if len(result.Models) == 1 {
-		models = result.Models
-		fmt.Println("Only one model found, using it directly")
-	}
-
-	fmt.Print(models)
-
-	// Filter out models that already exist in the gallery
-	fmt.Println("Filtering out existing models...")
-	models, err = filterExistingModels(models)
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "Error filtering existing models: %v\n", err)
-		os.Exit(1)
-	}
-
-	// Limit to maxModelsInt after filtering
-	if len(models) > maxModelsInt {
-		models = models[:maxModelsInt]
-	}
-
-	// Track added models for summary
-	var addedModelIDs []string
-	var addedModelURLs []string
-
-	// Generate YAML entries and append to gallery/index.yaml
-	if len(models) > 0 {
-		for _, model := range models {
-			addedModelIDs = append(addedModelIDs, model.ModelID)
-			// Generate Hugging Face URL for the model
-			modelURL := fmt.Sprintf("https://huggingface.co/%s", model.ModelID)
-			addedModelURLs = append(addedModelURLs, modelURL)
-		}
-		fmt.Println("Generating YAML entries for selected models...")
-		err = generateYAMLForModels(context.Background(), models, quantization)
-		if err != nil {
-			fmt.Fprintf(os.Stderr, "Error generating YAML entries: %v\n", err)
-			os.Exit(1)
-		}
-	} else {
-		fmt.Println("No new models to add to the gallery.")
-	}
-
-	// Create and write summary
-	processingTime := time.Since(startTime).String()
-	summary := AddedModelSummary{
-		SearchTerm:     searchTerm,
-		TotalFound:     result.TotalModelsFound,
-		ModelsAdded:    len(addedModelIDs),
-		AddedModelIDs:  addedModelIDs,
-		AddedModelURLs: addedModelURLs,
-		Quantization:   quantization,
-		ProcessingTime: processingTime,
-	}
-
-	// Write summary to file
-	summaryData, err := json.MarshalIndent(summary, "", "  ")
-	if err != nil {
-		fmt.Fprintf(os.Stderr, "Error marshaling summary: %v\n", err)
-	} else {
-		err = os.WriteFile("gallery-agent-summary.json", summaryData, 0644)
-		if err != nil {
-			fmt.Fprintf(os.Stderr, "Error writing summary file: %v\n", err)
-		} else {
-			fmt.Printf("Summary written to gallery-agent-summary.json\n")
-		}
-	}
-}
-
-func searchAndProcessModels(searchTerm string, limit int, quantization string) (*SearchResult, error) {
-	client := hfapi.NewClient()
-	var outputBuilder strings.Builder
-
-	fmt.Println("Searching for models...")
-	// Initialize the result struct
-	result := &SearchResult{
-		SearchTerm:   searchTerm,
-		Limit:        limit,
-		Quantization: quantization,
-		Models:       []ProcessedModel{},
-	}
-
-	models, err := client.GetLatest(searchTerm, limit)
-	if err != nil {
-		return nil, fmt.Errorf("failed to fetch models: %w", err)
-	}
-
-	fmt.Println("Models found:", len(models))
-	result.TotalModelsFound = len(models)
-
-	if len(models) == 0 {
-		outputBuilder.WriteString("No models found.\n")
-		result.FormattedOutput = outputBuilder.String()
-		return result, nil
-	}
-
-	outputBuilder.WriteString(fmt.Sprintf("Found %d models matching '%s':\n\n", len(models), searchTerm))
-
-	// Process each model
-	for i, model := range models {
-		outputBuilder.WriteString(fmt.Sprintf("%d. Processing Model: %s\n", i+1, model.ModelID))
-		outputBuilder.WriteString(fmt.Sprintf("   Author: %s\n", model.Author))
-		outputBuilder.WriteString(fmt.Sprintf("   Downloads: %d\n", model.Downloads))
-		outputBuilder.WriteString(fmt.Sprintf("   Last Modified: %s\n", model.LastModified))
-
-		// Initialize processed model struct
-		processedModel := ProcessedModel{
-			ModelID:                 model.ModelID,
-			Author:                  model.Author,
-			Downloads:               model.Downloads,
-			LastModified:            model.LastModified,
-			QuantizationPreferences: []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"},
-		}
-
-		// Get detailed model information
-		details, err := client.GetModelDetails(model.ModelID)
-		if err != nil {
-			errorMsg := fmt.Sprintf("   Error getting model details: %v\n", err)
-			outputBuilder.WriteString(errorMsg)
-			processedModel.ProcessingError = err.Error()
-			result.Models = append(result.Models, processedModel)
-			continue
-		}
-
-		// Define quantization preferences (in order of preference)
-		quantizationPreferences := []string{quantization, "Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"}
-
-		// Find preferred model file
-		preferredModelFile := hfapi.FindPreferredModelFile(details.Files, quantizationPreferences)
-
-		// Process files
-		processedFiles := make([]ProcessedModelFile, len(details.Files))
-		for j, file := range details.Files {
-			fileType := "other"
-			if file.IsReadme {
-				fileType = "readme"
-			} else if preferredModelFile != nil && file.Path == preferredModelFile.Path {
-				fileType = "model"
-			}
-
-			processedFiles[j] = ProcessedModelFile{
-				Path:     file.Path,
-				Size:     file.Size,
-				SHA256:   file.SHA256,
-				IsReadme: file.IsReadme,
-				FileType: fileType,
-			}
-		}
-
-		processedModel.Files = processedFiles
-
-		// Set preferred model file
-		if preferredModelFile != nil {
-			for _, file := range processedFiles {
-				if file.Path == preferredModelFile.Path {
-					processedModel.PreferredModelFile = &file
-					break
-				}
-			}
-		}
-
-		// Print file information
-		outputBuilder.WriteString(fmt.Sprintf("   Files found: %d\n", len(details.Files)))
-
-		if preferredModelFile != nil {
-			outputBuilder.WriteString(fmt.Sprintf("   Preferred Model File: %s (SHA256: %s)\n",
-				preferredModelFile.Path,
-				preferredModelFile.SHA256))
-		} else {
-			outputBuilder.WriteString(fmt.Sprintf("   No model file found with quantization preferences: %v\n", quantizationPreferences))
-		}
-
-		if details.ReadmeFile != nil {
-			outputBuilder.WriteString(fmt.Sprintf("   README File: %s\n", details.ReadmeFile.Path))
-
-			// Find and set readme file
-			for _, file := range processedFiles {
-				if file.IsReadme {
-					processedModel.ReadmeFile = &file
-					break
-				}
-			}
-
-			fmt.Println("Getting real readme for", model.ModelID, "waiting...")
-			// Use agent to get the real readme and prepare the model description
-			readmeContent, err := getRealReadme(context.Background(), model.ModelID)
-			if err == nil {
-				processedModel.ReadmeContent = readmeContent
-				processedModel.ReadmeContentPreview = truncateString(readmeContent, 200)
-				outputBuilder.WriteString(fmt.Sprintf("   README Content Preview: %s\n",
-					processedModel.ReadmeContentPreview))
-			} else {
-				fmt.Printf("   Warning: Failed to get real readme: %v\n", err)
-			}
-			fmt.Println("Real readme got", readmeContent)
-
-			// Extract metadata (tags, license) from README using LLM
-			fmt.Println("Extracting metadata for", model.ModelID, "waiting...")
-			tags, license, err := extractModelMetadata(context.Background(), processedModel)
-			if err == nil {
-				processedModel.Tags = tags
-				processedModel.License = license
-				outputBuilder.WriteString(fmt.Sprintf("   Tags: %v\n", tags))
-				outputBuilder.WriteString(fmt.Sprintf("   License: %s\n", license))
-			} else {
-				fmt.Printf("   Warning: Failed to extract metadata: %v\n", err)
-			}
-
-			// Extract icon from README or use HuggingFace avatar
-			icon := extractModelIcon(processedModel)
-			if icon != "" {
-				processedModel.Icon = icon
-				outputBuilder.WriteString(fmt.Sprintf("   Icon: %s\n", icon))
-			}
-			// Get README content
-			// readmeContent, err := client.GetReadmeContent(model.ModelID, details.ReadmeFile.Path)
-			// if err == nil {
-			// 	processedModel.ReadmeContent = readmeContent
-			// 	processedModel.ReadmeContentPreview = truncateString(readmeContent, 200)
-			// 	outputBuilder.WriteString(fmt.Sprintf("   README Content Preview: %s\n",
-			// 		processedModel.ReadmeContentPreview))
-			// }
-		}
-
-		// Print all files with their checksums
-		outputBuilder.WriteString("   All Files:\n")
-		for _, file := range processedFiles {
-			outputBuilder.WriteString(fmt.Sprintf("     - %s (%s, %d bytes", file.Path, file.FileType, file.Size))
-			if file.SHA256 != "" {
-				outputBuilder.WriteString(fmt.Sprintf(", SHA256: %s", file.SHA256))
-			}
-			outputBuilder.WriteString(")\n")
-		}
-
-		outputBuilder.WriteString("\n")
-		result.Models = append(result.Models, processedModel)
-	}
-
-	result.FormattedOutput = outputBuilder.String()
-	return result, nil
-}
-
-func truncateString(s string, maxLen int) string {
-	if len(s) <= maxLen {
-		return s
-	}
-	return s[:maxLen] + "..."
-}
--- a/.github/gallery-agent/testing.go
+++ b/.github/gallery-agent/testing.go
@@ -1,224 +0,0 @@
-package main
-
-import (
-	"context"
-	"fmt"
-	"math/rand/v2"
-	"strings"
-	"time"
-)
-
-// runSyntheticMode generates synthetic test data and appends it to the gallery
-func runSyntheticMode() error {
-	generator := NewSyntheticDataGenerator()
-
-	// Generate a random number of synthetic models (1-3)
-	numModels := generator.rand.IntN(3) + 1
-	fmt.Printf("Generating %d synthetic models for testing...\n", numModels)
-
-	var models []ProcessedModel
-	for range numModels {
-		model := generator.GenerateProcessedModel()
-		models = append(models, model)
-		fmt.Printf("Generated synthetic model: %s\n", model.ModelID)
-	}
-
-	// Generate YAML entries and append to gallery/index.yaml
-	fmt.Println("Generating YAML entries for synthetic models...")
-	err := generateYAMLForModels(context.Background(), models, "Q4_K_M")
-	if err != nil {
-		return fmt.Errorf("error generating YAML entries: %w", err)
-	}
-
-	fmt.Printf("Successfully added %d synthetic models to the gallery for testing!\n", len(models))
-	return nil
-}
-
-// SyntheticDataGenerator provides methods to generate synthetic test data
-type SyntheticDataGenerator struct {
-	rand *rand.Rand
-}
-
-// NewSyntheticDataGenerator creates a new synthetic data generator
-func NewSyntheticDataGenerator() *SyntheticDataGenerator {
-	return &SyntheticDataGenerator{
-		rand: rand.New(rand.NewPCG(uint64(time.Now().UnixNano()), 0)),
-	}
-}
-
-// GenerateProcessedModelFile creates a synthetic ProcessedModelFile
-func (g *SyntheticDataGenerator) GenerateProcessedModelFile() ProcessedModelFile {
-	fileTypes := []string{"model", "readme", "other"}
-	fileType := fileTypes[g.rand.IntN(len(fileTypes))]
-
-	var path string
-	var isReadme bool
-
-	switch fileType {
-	case "model":
-		path = fmt.Sprintf("model-%s.gguf", g.randomString(8))
-		isReadme = false
-	case "readme":
-		path = "README.md"
-		isReadme = true
-	default:
-		path = fmt.Sprintf("file-%s.txt", g.randomString(6))
-		isReadme = false
-	}
-
-	return ProcessedModelFile{
-		Path:     path,
-		Size:     int64(g.rand.IntN(1000000000) + 1000000), // 1MB to 1GB
-		SHA256:   g.randomSHA256(),
-		IsReadme: isReadme,
-		FileType: fileType,
-	}
-}
-
-// GenerateProcessedModel creates a synthetic ProcessedModel
-func (g *SyntheticDataGenerator) GenerateProcessedModel() ProcessedModel {
-	authors := []string{"microsoft", "meta", "google", "openai", "anthropic", "mistralai", "huggingface"}
-	modelNames := []string{"llama", "gpt", "claude", "mistral", "gemma", "phi", "qwen", "codellama"}
-
-	author := authors[g.rand.IntN(len(authors))]
-	modelName := modelNames[g.rand.IntN(len(modelNames))]
-	modelID := fmt.Sprintf("%s/%s-%s", author, modelName, g.randomString(6))
-
-	// Generate files
-	numFiles := g.rand.IntN(5) + 2 // 2-6 files
-	files := make([]ProcessedModelFile, numFiles)
-
-	// Ensure at least one model file and one readme
-	hasModelFile := false
-	hasReadme := false
-
-	for i := range numFiles {
-		files[i] = g.GenerateProcessedModelFile()
-		if files[i].FileType == "model" {
-			hasModelFile = true
-		}
-		if files[i].FileType == "readme" {
-			hasReadme = true
-		}
-	}
-
-	// Add required files if missing
-	if !hasModelFile {
-		modelFile := g.GenerateProcessedModelFile()
-		modelFile.FileType = "model"
-		modelFile.Path = fmt.Sprintf("%s-Q4_K_M.gguf", modelName)
-		files = append(files, modelFile)
-	}
-
-	if !hasReadme {
-		readmeFile := g.GenerateProcessedModelFile()
-		readmeFile.FileType = "readme"
-		readmeFile.Path = "README.md"
-		readmeFile.IsReadme = true
-		files = append(files, readmeFile)
-	}
-
-	// Find preferred model file
-	var preferredModelFile *ProcessedModelFile
-	for i := range files {
-		if files[i].FileType == "model" {
-			preferredModelFile = &files[i]
-			break
-		}
-	}
-
-	// Find readme file
-	var readmeFile *ProcessedModelFile
-	for i := range files {
-		if files[i].FileType == "readme" {
-			readmeFile = &files[i]
-			break
-		}
-	}
-
-	readmeContent := g.generateReadmeContent(modelName, author)
-
-	// Generate sample metadata
-	licenses := []string{"apache-2.0", "mit", "llama2", "gpl-3.0", "bsd", ""}
-	license := licenses[g.rand.IntN(len(licenses))]
-
-	sampleTags := []string{"llm", "gguf", "gpu", "cpu", "text-to-text", "chat", "instruction-tuned"}
-	numTags := g.rand.IntN(4) + 3 // 3-6 tags
-	tags := make([]string, numTags)
-	for i := range numTags {
-		tags[i] = sampleTags[g.rand.IntN(len(sampleTags))]
-	}
-	// Remove duplicates
-	tags = g.removeDuplicates(tags)
-
-	// Optionally include icon (50% chance)
-	icon := ""
-	if g.rand.IntN(2) == 0 {
-		icon = fmt.Sprintf("https://cdn-avatars.huggingface.co/v1/production/uploads/%s.png", g.randomString(24))
-	}
-
-	return ProcessedModel{
-		ModelID:                 modelID,
-		Author:                  author,
-		Downloads:               g.rand.IntN(1000000) + 1000,
-		LastModified:            g.randomDate(),
-		Files:                   files,
-		PreferredModelFile:      preferredModelFile,
-		ReadmeFile:              readmeFile,
-		ReadmeContent:           readmeContent,
-		ReadmeContentPreview:    truncateString(readmeContent, 200),
-		QuantizationPreferences: []string{"Q4_K_M", "Q4_K_S", "Q3_K_M", "Q2_K"},
-		ProcessingError:         "",
-		Tags:                    tags,
-		License:                 license,
-		Icon:                    icon,
-	}
-}
-
-// Helper methods for synthetic data generation
-func (g *SyntheticDataGenerator) randomString(length int) string {
-	const charset = "abcdefghijklmnopqrstuvwxyz0123456789"
-	b := make([]byte, length)
-	for i := range b {
-		b[i] = charset[g.rand.IntN(len(charset))]
-	}
-	return string(b)
-}
-
-func (g *SyntheticDataGenerator) randomSHA256() string {
-	const charset = "0123456789abcdef"
-	b := make([]byte, 64)
-	for i := range b {
-		b[i] = charset[g.rand.IntN(len(charset))]
-	}
-	return string(b)
-}
-
-func (g *SyntheticDataGenerator) randomDate() string {
-	now := time.Now()
-	daysAgo := g.rand.IntN(365) // Random date within last year
-	pastDate := now.AddDate(0, 0, -daysAgo)
-	return pastDate.Format("2006-01-02T15:04:05.000Z")
-}
-
-func (g *SyntheticDataGenerator) removeDuplicates(slice []string) []string {
-	keys := make(map[string]bool)
-	result := []string{}
-	for _, item := range slice {
-		if !keys[item] {
-			keys[item] = true
-			result = append(result, item)
-		}
-	}
-	return result
-}
-
-func (g *SyntheticDataGenerator) generateReadmeContent(modelName, author string) string {
-	templates := []string{
-		fmt.Sprintf("# %s Model\n\nThis is a %s model developed by %s. It's designed for various natural language processing tasks including text generation, question answering, and conversation.\n\n## Features\n\n- High-quality text generation\n- Efficient inference\n- Multiple quantization options\n- Easy to use with LocalAI\n\n## Usage\n\nUse this model with LocalAI for various AI tasks.", strings.Title(modelName), modelName, author),
-		fmt.Sprintf("# %s\n\nA powerful language model from %s. This model excels at understanding and generating human-like text across multiple domains.\n\n## Capabilities\n\n- Text completion\n- Code generation\n- Creative writing\n- Technical documentation\n\n## Model Details\n\n- Architecture: Transformer-based\n- Training: Large-scale supervised learning\n- Quantization: Available in multiple formats", strings.Title(modelName), author),
-		fmt.Sprintf("# %s Language Model\n\nDeveloped by %s, this model represents state-of-the-art performance in natural language understanding and generation.\n\n## Key Features\n\n- Multilingual support\n- Context-aware responses\n- Efficient memory usage\n- Fast inference speed\n\n## Applications\n\n- Chatbots and virtual assistants\n- Content generation\n- Code completion\n- Educational tools", strings.Title(modelName), author),
-	}
-
-	return templates[g.rand.IntN(len(templates))]
-}
--- a/.github/gallery-agent/tools.go
+++ b/.github/gallery-agent/tools.go
@@ -1,46 +0,0 @@
-package main
-
-import (
-	"fmt"
-
-	hfapi "github.com/mudler/LocalAI/pkg/huggingface-api"
-	openai "github.com/sashabaranov/go-openai"
-	jsonschema "github.com/sashabaranov/go-openai/jsonschema"
-)
-
-// Get repository README from HF
-type HFReadmeTool struct {
-	client *hfapi.Client
-}
-
-func (s *HFReadmeTool) Execute(args map[string]any) (string, any, error) {
-	q, ok := args["repository"].(string)
-	if !ok {
-		return "", nil, fmt.Errorf("no query")
-	}
-	readme, err := s.client.GetReadmeContent(q, "README.md")
-	if err != nil {
-		return "", nil, err
-	}
-	return readme, nil, nil
-}
-
-func (s *HFReadmeTool) Tool() openai.Tool {
-	return openai.Tool{
-		Type: openai.ToolTypeFunction,
-		Function: &openai.FunctionDefinition{
-			Name:        "hf_readme",
-			Description: "A tool to get the README content of a huggingface repository",
-			Parameters: jsonschema.Definition{
-				Type: jsonschema.Object,
-				Properties: map[string]jsonschema.Definition{
-					"repository": {
-						Type:        jsonschema.String,
-						Description: "The huggingface repository to get the README content of",
-					},
-				},
-				Required: []string{"repository"},
-			},
-		},
-	}
-}
--- a/.github/labeler.yml
+++ b/.github/labeler.yml
@@ -1,15 +1,6 @@
-enhancement:
+enhancements:
 - head-branch: ['^feature', 'feature']

-dependencies:
- any:
-  - changed-files:
-    - any-glob-to-any-file: 'Makefile'
-  - changed-files:
-    - any-glob-to-any-file: '*.mod'
-  - changed-files:
-    - any-glob-to-any-file: '*.sum'
-
 kind/documentation:
 - any:
  - changed-files:
@@ -17,11 +8,6 @@ kind/documentation:
  - changed-files:
    - any-glob-to-any-file: '*.md'

-area/ai-model:
- any:
-  - changed-files:
-    - any-glob-to-any-file: 'gallery/*'
-
 examples:
 - any:
  - changed-files:
@@ -30,4 +16,4 @@ examples:
 ci:
 - any:
  - changed-files:
-    - any-glob-to-any-file: '.github/*'
+    - any-glob-to-any-file: '.github/*'
--- a/.github/release.yml
+++ b/.github/release.yml
@@ -13,9 +13,6 @@ changelog:
      labels:
        - bug
        - regression
-    - title: "🖧 P2P area"
-      labels:
-         - area/p2p
    - title: Exciting New Features 🎉
      labels:
        - Semver-Minor
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
--- a/.github/workflows/backend_build.yml
+++ b/.github/workflows/backend_build.yml
@@ -1,250 +0,0 @@
---
-name: 'build backend container images (reusable)'
-
-on:
-  workflow_call:
-    inputs:
-      base-image:
-        description: 'Base image'
-        required: true
-        type: string
-      build-type:
-        description: 'Build type'
-        default: ''
-        type: string
-      cuda-major-version:
-        description: 'CUDA major version'
-        default: "12"
-        type: string
-      cuda-minor-version:
-        description: 'CUDA minor version'
-        default: "1"
-        type: string
-      platforms:
-        description: 'Platforms'
-        default: ''
-        type: string
-      tag-latest:
-        description: 'Tag latest'
-        default: ''
-        type: string
-      tag-suffix:
-        description: 'Tag suffix'
-        default: ''
-        type: string
-      runs-on:
-        description: 'Runs on'
-        required: true
-        default: ''
-        type: string
-      backend:
-        description: 'Backend to build'
-        required: true
-        type: string
-      context:
-        description: 'Build context'
-        required: true
-        type: string
-      dockerfile:
-        description: 'Build Dockerfile'
-        required: true
-        type: string
-      skip-drivers:
-        description: 'Skip drivers'
-        default: 'false'
-        type: string
-      ubuntu-version:
-        description: 'Ubuntu version'
-        required: false
-        default: '2204'
-        type: string
-    secrets:
-      dockerUsername:
-        required: false
-      dockerPassword:
-        required: false
-      quayUsername:
-        required: true
-      quayPassword:
-        required: true
-
-jobs:
-  backend-build:
-    runs-on: ${{ inputs.runs-on }}
-    env:
-        quay_username: ${{ secrets.quayUsername }}
-    steps:
-
-
-      - name: Free Disk Space (Ubuntu)
-        if: inputs.runs-on == 'ubuntu-latest'
-        uses: jlumbroso/free-disk-space@main
-        with:
-          # this might remove tools that are actually needed,
-          # if set to "true" but frees about 6 GB
-          tool-cache: true
-          # all of these default to true, but feel free to set to
-          # "false" if necessary for your workflow
-          android: true
-          dotnet: true
-          haskell: true
-          large-packages: true
-          docker-images: true
-          swap-storage: true
-
-      - name: Force Install GIT latest
-        run: |
-          sudo apt-get update \
-          && sudo apt-get install -y software-properties-common \
-          && sudo apt-get update \
-          && sudo add-apt-repository -y ppa:git-core/ppa \
-          && sudo apt-get update \
-          && sudo apt-get install -y git
-
-      - name: Checkout
-        uses: actions/checkout@v6
-
-      - name: Release space from worker
-        if: inputs.runs-on == 'ubuntu-latest'
-        run: |
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          df -h
-          echo
-          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools snapd || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools snapd || true
-          sudo rm -rf /usr/local/lib/android
-          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
-          sudo rm -rf /usr/share/dotnet
-          sudo apt-get remove -y '^mono-.*' || true
-          sudo apt-get remove -y '^ghc-.*' || true
-          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
-          sudo apt-get remove -y 'php.*' || true
-          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
-          sudo apt-get remove -y '^google-.*' || true
-          sudo apt-get remove -y azure-cli || true
-          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
-          sudo apt-get remove -y '^gfortran-.*' || true
-          sudo apt-get remove -y microsoft-edge-stable || true
-          sudo apt-get remove -y firefox || true
-          sudo apt-get remove -y powershell || true
-          sudo apt-get remove -y r-base-core || true
-          sudo apt-get autoremove -y
-          sudo apt-get clean
-          echo
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          sudo rm -rfv build || true
-          sudo rm -rf /usr/share/dotnet || true
-          sudo rm -rf /opt/ghc || true
-          sudo rm -rf "/usr/local/share/boost" || true
-          sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
-          df -h
-
-      - name: Docker meta
-        id: meta
-        if: github.event_name != 'pull_request'
-        uses: docker/metadata-action@v6
-        with:
-          images: |
-            quay.io/go-skynet/local-ai-backends
-            localai/localai-backends
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-            type=sha
-          flavor: |
-            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.tag-suffix }},onlatest=true
-
-      - name: Docker meta for PR
-        id: meta_pull_request
-        if: github.event_name == 'pull_request'
-        uses: docker/metadata-action@v6
-        with:
-          images: |
-            quay.io/go-skynet/ci-tests
-          tags: |
-            type=ref,event=branch,suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
-            type=semver,pattern={{raw}},suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
-            type=sha,suffix=${{ github.event.number }}-${{ inputs.backend }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
-          flavor: |
-            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.tag-suffix }},onlatest=true
-## End testing image
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@master
-        with:
-          platforms: all
-
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@master
-
-      - name: Login to DockerHub
-        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v4
-        with:
-          username: ${{ secrets.dockerUsername }}
-          password: ${{ secrets.dockerPassword }}
-
-      - name: Login to Quay.io
-        if: ${{ env.quay_username != '' }}
-        uses: docker/login-action@v4
-        with:
-          registry: quay.io
-          username: ${{ secrets.quayUsername }}
-          password: ${{ secrets.quayPassword }}
-
-      - name: Build and push
-        uses: docker/build-push-action@v7
-        if: github.event_name != 'pull_request'
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BUILD_TYPE=${{ inputs.build-type }}
-            SKIP_DRIVERS=${{ inputs.skip-drivers }}
-            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
-            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
-            BASE_IMAGE=${{ inputs.base-image }}
-            BACKEND=${{ inputs.backend }}
-            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-          context: ${{ inputs.context }}
-          file: ${{ inputs.dockerfile }}
-          cache-from: type=gha
-          platforms: ${{ inputs.platforms }}
-          push: ${{ github.event_name != 'pull_request' }}
-          tags: ${{ steps.meta.outputs.tags }}
-          labels: ${{ steps.meta.outputs.labels }}
-
-      - name: Build and push (PR)
-        uses: docker/build-push-action@v7
-        if: github.event_name == 'pull_request'
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BUILD_TYPE=${{ inputs.build-type }}
-            SKIP_DRIVERS=${{ inputs.skip-drivers }}
-            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
-            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
-            BASE_IMAGE=${{ inputs.base-image }}
-            BACKEND=${{ inputs.backend }}
-            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-          context: ${{ inputs.context }}
-          file: ${{ inputs.dockerfile }}
-          cache-from: type=gha
-          platforms: ${{ inputs.platforms }}
-          push: ${{ env.quay_username != '' }}
-          tags: ${{ steps.meta_pull_request.outputs.tags }}
-          labels: ${{ steps.meta_pull_request.outputs.labels }}
-
-
-
-      - name: job summary
-        run: |
-          echo "Built image: ${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/backend_build_darwin.yml
+++ b/.github/workflows/backend_build_darwin.yml
@@ -1,144 +0,0 @@
---
-name: 'build darwin python backend container images (reusable)'
-
-on:
-  workflow_call:
-    inputs:
-      backend:
-        description: 'Backend to build'
-        required: true
-        type: string
-      build-type:
-        description: 'Build type (e.g., mps)'
-        default: ''
-        type: string
-      use-pip:
-        description: 'Use pip to install dependencies'
-        default: false
-        type: boolean
-      lang:
-        description: 'Programming language (e.g. go)'
-        default: 'python'
-        type: string
-      go-version:
-        description: 'Go version to use'
-        default: '1.24.x'
-        type: string
-      tag-suffix:
-        description: 'Tag suffix for the built image'
-        required: true
-        type: string
-      runs-on:
-        description: 'Runner to use'
-        default: 'macOS-14'
-        type: string
-    secrets:
-      dockerUsername:
-        required: false
-      dockerPassword:
-        required: false
-      quayUsername:
-        required: true
-      quayPassword:
-        required: true
-
-jobs:
-  darwin-backend-build:
-    runs-on: ${{ inputs.runs-on }}
-    strategy:
-      matrix:
-        go-version: ['${{ inputs.go-version }}']
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-
-      - name: Setup Go ${{ matrix.go-version }}
-        uses: actions/setup-go@v5
-        with:
-          go-version: ${{ matrix.go-version }}
-          cache: false
-
-      # You can test your matrix by printing the current Go version
-      - name: Display Go version
-        run: go version
-
-      - name: Dependencies
-        run: |
-          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
-
-      - name: Build ${{ inputs.backend }}-darwin
-        run: |
-          make protogen-go
-          BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
-
-      - name: Upload ${{ inputs.backend }}.tar
-        uses: actions/upload-artifact@v7
-        with:
-          name: ${{ inputs.backend }}-tar
-          path: backend-images/${{ inputs.backend }}.tar
-
-  darwin-backend-publish:
-    needs: darwin-backend-build
-    if: github.event_name != 'pull_request'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Download ${{ inputs.backend }}.tar
-        uses: actions/download-artifact@v8
-        with:
-          name: ${{ inputs.backend }}-tar
-          path: .
-
-      - name: Install crane
-        run: |
-          curl -L https://github.com/google/go-containerregistry/releases/latest/download/go-containerregistry_Linux_x86_64.tar.gz | tar -xz
-          sudo mv crane /usr/local/bin/
-
-      - name: Log in to DockerHub
-        run: |
-          echo "${{ secrets.dockerPassword }}" | crane auth login docker.io -u "${{ secrets.dockerUsername }}" --password-stdin
-
-      - name: Log in to quay.io
-        run: |
-          echo "${{ secrets.quayPassword }}" | crane auth login quay.io -u "${{ secrets.quayUsername }}" --password-stdin
-
-      - name: Docker meta
-        id: meta
-        uses: docker/metadata-action@v6
-        with:
-          images: |
-            localai/localai-backends
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-            type=sha
-          flavor: |
-            latest=auto
-            suffix=${{ inputs.tag-suffix }},onlatest=true
-
-      - name: Docker meta
-        id: quaymeta
-        uses: docker/metadata-action@v6
-        with:
-          images: |
-            quay.io/go-skynet/local-ai-backends
-          tags: |
-            type=ref,event=branch
-            type=semver,pattern={{raw}}
-            type=sha
-          flavor: |
-            latest=auto
-            suffix=${{ inputs.tag-suffix }},onlatest=true
-
-      - name: Push Docker image (DockerHub)
-        run: |
-          for tag in $(echo "${{ steps.meta.outputs.tags }}" | tr ',' '\n'); do
-            crane push ${{ inputs.backend }}.tar $tag
-          done
-
-      - name: Push Docker image (Quay)
-        run: |
-          for tag in $(echo "${{ steps.quaymeta.outputs.tags }}" | tr ',' '\n'); do
-            crane push ${{ inputs.backend }}.tar $tag
-          done
--- a/.github/workflows/backend_pr.yml
+++ b/.github/workflows/backend_pr.yml
@@ -1,79 +0,0 @@
-name: 'build backend container images (PR-filtered)'
-
-on:
-  pull_request:
-
-concurrency:
-  group: ci-backends-pr-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  generate-matrix:
-    runs-on: ubuntu-latest
-    outputs:
-      matrix: ${{ steps.set-matrix.outputs.matrix }}
-      matrix-darwin: ${{ steps.set-matrix.outputs.matrix-darwin }}
-      has-backends: ${{ steps.set-matrix.outputs.has-backends }}
-      has-backends-darwin: ${{ steps.set-matrix.outputs.has-backends-darwin }}
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-
-      - name: Setup Bun
-        uses: oven-sh/setup-bun@v2
-
-      - name: Install dependencies
-        run: |
-          bun add js-yaml
-          bun add @octokit/core
-
-      # filters the matrix in backend.yml
-      - name: Filter matrix for changed backends
-        id: set-matrix
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          GITHUB_EVENT_PATH: ${{ github.event_path }}
-        run: bun run scripts/changed-backends.js
-
-  backend-jobs:
-    needs: generate-matrix
-    uses: ./.github/workflows/backend_build.yml
-    if: needs.generate-matrix.outputs.has-backends == 'true'
-    with:
-      tag-latest: ${{ matrix.tag-latest }}
-      tag-suffix: ${{ matrix.tag-suffix }}
-      build-type: ${{ matrix.build-type }}
-      cuda-major-version: ${{ matrix.cuda-major-version }}
-      cuda-minor-version: ${{ matrix.cuda-minor-version }}
-      platforms: ${{ matrix.platforms }}
-      runs-on: ${{ matrix.runs-on }}
-      base-image: ${{ matrix.base-image }}
-      backend: ${{ matrix.backend }}
-      dockerfile: ${{ matrix.dockerfile }}
-      skip-drivers: ${{ matrix.skip-drivers }}
-      context: ${{ matrix.context }}
-      ubuntu-version: ${{ matrix.ubuntu-version }}
-    secrets:
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      fail-fast: true
-      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
-  backend-jobs-darwin:
-    needs: generate-matrix
-    uses: ./.github/workflows/backend_build_darwin.yml
-    if: needs.generate-matrix.outputs.has-backends-darwin == 'true'
-    with:
-      backend: ${{ matrix.backend }}
-      build-type: ${{ matrix.build-type }}
-      go-version: "1.24.x"
-      tag-suffix: ${{ matrix.tag-suffix }}
-      lang: ${{ matrix.lang || 'python' }}
-      use-pip: ${{ matrix.backend == 'diffusers' }}
-      runs-on: "macos-latest"
-    secrets:
-      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-    strategy:
-      fail-fast: true
-      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix-darwin) }}
--- a/.github/workflows/build-test.yaml
+++ b/.github/workflows/build-test.yaml
@@ -1,67 +0,0 @@
-name: Build test
-
-on:
-  push:
-    branches:
-      - master
-  pull_request:
-
-jobs:
-  build-test:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: 1.25
-      - name: Run GoReleaser
-        run: |
-          make dev-dist
-  launcher-build-darwin:
-    runs-on: macos-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: 1.25
-      - name: Build launcher for macOS ARM64
-        run: |
-          make build-launcher-darwin
-          ls -liah dist
-      - name: Upload macOS launcher artifacts
-        uses: actions/upload-artifact@v7
-        with:
-          name: launcher-macos
-          path: dist/
-          retention-days: 30
-      
-  launcher-build-linux:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: 1.25
-      - name: Build launcher for Linux
-        run: |
-          sudo apt-get update
-          sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
-          make build-launcher-linux
-      - name: Upload Linux launcher artifacts
-        uses: actions/upload-artifact@v7
-        with:
-          name: launcher-linux
-          path: local-ai-launcher-linux.tar.xz
-          retention-days: 30
--- a/.github/workflows/bump-inference-defaults.yml
+++ b/.github/workflows/bump-inference-defaults.yml
@@ -1,48 +0,0 @@
-name: Bump inference defaults
-
-on:
-  schedule:
-    # Run daily at 06:00 UTC
-    - cron: '0 6 * * *'
-  workflow_dispatch: # Allow manual trigger
-
-permissions:
-  contents: write
-  pull-requests: write
-
-jobs:
-  bump:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v6
-
-      - uses: actions/setup-go@v5
-        with:
-          go-version-file: go.mod
-
-      - name: Re-fetch inference defaults
-        run: make generate-force
-
-      - name: Check for changes
-        id: diff
-        run: |
-          if git diff --quiet core/config/inference_defaults.json; then
-            echo "changed=false" >> "$GITHUB_OUTPUT"
-          else
-            echo "changed=true" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Create Pull Request
-        if: steps.diff.outputs.changed == 'true'
-        uses: peter-evans/create-pull-request@v8
-        with:
-          commit-message: "chore: bump inference defaults from unsloth"
-          title: "chore: bump inference defaults from unsloth"
-          body: |
-            Auto-generated update of `core/config/inference_defaults.json` from
-            [unsloth's inference_defaults.json](https://github.com/unslothai/unsloth/blob/main/studio/backend/assets/configs/inference_defaults.json).
-
-            This PR was created automatically by the `bump-inference-defaults` workflow.
-          branch: chore/bump-inference-defaults
-          delete-branch: true
-          labels: automated
--- a/.github/workflows/bump_deps.yaml
+++ b/.github/workflows/bump_deps.yaml
@@ -1,67 +1,62 @@
-name: Bump Backend dependencies
+name: Bump dependencies
 on:
  schedule:
    - cron: 0 20 * * *
  workflow_dispatch:
 jobs:
-  bump-backends:
-    if: github.repository == 'mudler/LocalAI'
+  bump:
    strategy:
      fail-fast: false
      matrix:
        include:
-          - repository: "ggml-org/llama.cpp"
-            variable: "LLAMA_VERSION"
+          - repository: "go-skynet/go-llama.cpp"
+            variable: "GOLLAMA_VERSION"
            branch: "master"
-            file: "backend/cpp/llama-cpp/Makefile"
-          - repository: "ggml-org/whisper.cpp"
+          - repository: "ggerganov/llama.cpp"
+            variable: "CPPLLAMA_VERSION"
+            branch: "master"
+          - repository: "go-skynet/go-ggml-transformers.cpp"
+            variable: "GOGGMLTRANSFORMERS_VERSION"
+            branch: "master"
+          - repository: "donomii/go-rwkv.cpp"
+            variable: "RWKV_VERSION"
+            branch: "main"
+          - repository: "ggerganov/whisper.cpp"
            variable: "WHISPER_CPP_VERSION"
            branch: "master"
-            file: "backend/go/whisper/Makefile"
-          - repository: "leejet/stable-diffusion.cpp"
-            variable: "STABLEDIFFUSION_GGML_VERSION"
+          - repository: "go-skynet/go-bert.cpp"
+            variable: "BERT_VERSION"
+            branch: "master"
+          - repository: "go-skynet/bloomz.cpp"
+            variable: "BLOOMZ_VERSION"
+            branch: "main"
+          - repository: "nomic-ai/gpt4all"
+            variable: "GPT4ALL_VERSION"
+            branch: "main"
+          - repository: "mudler/go-ggllm.cpp"
+            variable: "GOGGLLM_VERSION"
+            branch: "master"
+          - repository: "mudler/go-stable-diffusion"
+            variable: "STABLEDIFFUSION_VERSION"
            branch: "master"
-            file: "backend/go/stablediffusion-ggml/Makefile"
          - repository: "mudler/go-piper"
            variable: "PIPER_VERSION"
            branch: "master"
-            file: "backend/go/piper/Makefile"
-          - repository: "antirez/voxtral.c"
-            variable: "VOXTRAL_VERSION"
-            branch: "main"
-            file: "backend/go/voxtral/Makefile"
-          - repository: "ace-step/acestep.cpp"
-            variable: "ACESTEP_CPP_VERSION"
-            branch: "master"
-            file: "backend/go/acestep-cpp/Makefile"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v6
+      - uses: actions/checkout@v4
      - name: Bump dependencies 🔧
-        id: bump
        run: |
-          bash .github/bump_deps.sh ${{ matrix.repository }} ${{ matrix.branch }} ${{ matrix.variable }} ${{ matrix.file }}
-          {
-            echo 'message<<EOF'
-            cat "${{ matrix.variable }}_message.txt"
-            echo EOF
-          } >> "$GITHUB_OUTPUT"
-          {
-            echo 'commit<<EOF'
-            cat "${{ matrix.variable }}_commit.txt"
-            echo EOF
-          } >> "$GITHUB_OUTPUT"
-          rm -rfv ${{ matrix.variable }}_message.txt
-          rm -rfv ${{ matrix.variable }}_commit.txt
+          bash .github/bump_deps.sh ${{ matrix.repository }} ${{ matrix.branch }} ${{ matrix.variable }}
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v8
+        uses: peter-evans/create-pull-request@v6
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
          commit-message: ':arrow_up: Update ${{ matrix.repository }}'
-          title: 'chore: :arrow_up: Update ${{ matrix.repository }} to `${{ steps.bump.outputs.commit }}`'
+          title: ':arrow_up: Update ${{ matrix.repository }}'
          branch: "update/${{ matrix.variable }}"
-          body: ${{ steps.bump.outputs.message }}
+          body: Bump of ${{ matrix.repository }} version
          signoff: true


--- a/.github/workflows/bump_docs.yaml
+++ b/.github/workflows/bump_docs.yaml
@@ -1,11 +1,10 @@
-name: Bump Documentation
+name: Bump dependencies
 on:
  schedule:
    - cron: 0 20 * * *
  workflow_dispatch:
 jobs:
-  bump-docs:
-    if: github.repository == 'mudler/LocalAI'
+  bump:
    strategy:
      fail-fast: false
      matrix:
@@ -13,17 +12,17 @@ jobs:
          - repository: "mudler/LocalAI"
    runs-on: ubuntu-latest
    steps:
-      - uses: actions/checkout@v6
+      - uses: actions/checkout@v4
      - name: Bump dependencies 🔧
        run: |
          bash .github/bump_docs.sh ${{ matrix.repository }}
      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v8
+        uses: peter-evans/create-pull-request@v6
        with:
          token: ${{ secrets.UPDATE_BOT_TOKEN }}
          push-to-fork: ci-forks/LocalAI
          commit-message: ':arrow_up: Update docs version ${{ matrix.repository }}'
-          title: 'docs: :arrow_up: update docs version ${{ matrix.repository }}'
+          title: ':arrow_up: Update docs version ${{ matrix.repository }}'
          branch: "update/docs"
          body: Bump of ${{ matrix.repository }} version inside docs
          signoff: true
--- a/.github/workflows/checksum_checker.yaml
+++ b/.github/workflows/checksum_checker.yaml
@@ -1,47 +0,0 @@
-name: Check if checksums are up-to-date
-on:
-  schedule:
-    - cron: 0 20 * * *
-  workflow_dispatch:
-jobs:
-  checksum_check:
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Force Install GIT latest
-        run: |
-          sudo apt-get update \
-          && sudo apt-get install -y software-properties-common \
-          && sudo apt-get update \
-          && sudo add-apt-repository -y ppa:git-core/ppa \
-          && sudo apt-get update \
-          && sudo apt-get install -y git
-      - uses: actions/checkout@v6
-      - name: Install dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y pip wget
-          pip install huggingface_hub
-      - name: 'Setup yq'
-        uses: dcarbone/install-yq-action@v1.3.1
-        with:
-          version: 'v4.44.2'
-          download-compressed: true
-          force: true
-
-      - name: Checksum checker 🔧
-        run: |
-          export HF_HOME=/hf_cache
-          sudo mkdir /hf_cache
-          sudo chmod 777 /hf_cache
-          bash .github/checksum_checker.sh gallery/index.yaml
-      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v8
-        with:
-          token: ${{ secrets.UPDATE_BOT_TOKEN }}
-          push-to-fork: ci-forks/LocalAI
-          commit-message: ':arrow_up: Checksum updates in gallery/index.yaml'
-          title: 'chore(model-gallery): :arrow_up: update checksum'
-          branch: "update/checksum"
-          body: Updating checksums in gallery/index.yaml
-          signoff: true
--- a/.github/workflows/disabled/dependabot_auto.yml
+++ b/.github/workflows/disabled/dependabot_auto.yml
@@ -9,18 +9,18 @@ permissions:

 jobs:
  dependabot:
-    if: github.repository == 'mudler/LocalAI' && github.actor == 'dependabot[bot]'
    runs-on: ubuntu-latest
+    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      - name: Dependabot metadata
        id: metadata
-        uses: dependabot/fetch-metadata@v2.5.0
+        uses: dependabot/fetch-metadata@v2.0.0
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
          skip-commit-verification: true

      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4

      - name: Approve a PR if not already approved
        run: |
--- a/.github/workflows/deploy-explorer.yaml
+++ b/.github/workflows/deploy-explorer.yaml
@@ -1,65 +0,0 @@
-name: Explorer deployment
-
-on:
-  push:
-    branches:
-      - master
-    tags:
-      - 'v*'
-
-concurrency:
-  group: ci-deploy-${{ github.head_ref || github.ref }}-${{ github.repository }}
-
-jobs:
-  build-linux:
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - uses: actions/setup-go@v5
-        with:
-          go-version: '1.21.x'
-          cache: false
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y wget curl build-essential ffmpeg protobuf-compiler ccache upx-ucl gawk cmake libgmock-dev
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          make protogen-go
-      - name: Build api
-        run: |
-          CGO_ENABLED=0 make build
-      - name: rm
-        uses: appleboy/ssh-action@v1.2.5
-        with:
-            host: ${{ secrets.EXPLORER_SSH_HOST }}
-            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
-            key: ${{ secrets.EXPLORER_SSH_KEY }}
-            port: ${{ secrets.EXPLORER_SSH_PORT }}
-            script: |
-                sudo rm -rf local-ai/ || true
-      - name: copy file via ssh
-        uses: appleboy/scp-action@v1.0.0
-        with:
-            host: ${{ secrets.EXPLORER_SSH_HOST }}
-            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
-            key: ${{ secrets.EXPLORER_SSH_KEY }}
-            port: ${{ secrets.EXPLORER_SSH_PORT }}
-            source: "local-ai"
-            overwrite: true
-            rm: true
-            target: ./local-ai
-      - name: restarting
-        uses: appleboy/ssh-action@v1.2.5
-        with:
-            host: ${{ secrets.EXPLORER_SSH_HOST }}
-            username: ${{ secrets.EXPLORER_SSH_USERNAME }}
-            key: ${{ secrets.EXPLORER_SSH_KEY }}
-            port: ${{ secrets.EXPLORER_SSH_PORT }}
-            script: |
-                sudo cp -rfv local-ai/local-ai /usr/bin/local-ai
-                sudo systemctl restart local-ai
--- a/.github/workflows/disabled/comment-pr.yaml
+++ b/.github/workflows/disabled/comment-pr.yaml
@@ -1,83 +0,0 @@
-name: Comment PRs
-on:
-  pull_request_target:
-
-jobs:
-  comment-pr:
-    env:
-        MODEL_NAME: hermes-2-theta-llama-3-8b
-    runs-on: ubuntu-latest
-    steps:
-    - name: Checkout code
-      uses: actions/checkout@v3
-      with:
-        ref: "${{ github.event.pull_request.merge_commit_sha }}"
-        fetch-depth: 0 # needed to checkout all branches for this Action to work
-    - uses: mudler/localai-github-action@v1
-      with:
-        model: 'hermes-2-theta-llama-3-8b' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
-      # Check the PR diff using the current branch and the base branch of the PR
-    - uses: GrantBirki/git-diff-action@v2.7.0
-      id: git-diff-action
-      with:
-            json_diff_file_output: diff.json
-            raw_diff_file_output: diff.txt
-            file_output_only: "true"
-            base_branch: ${{ github.event.pull_request.base.sha }}
-    - name: Show diff
-      env:
-        DIFF: ${{ steps.git-diff-action.outputs.raw-diff-path }}
-      run: |
-            cat $DIFF
-    - name: Summarize
-      env:
-        DIFF: ${{ steps.git-diff-action.outputs.raw-diff-path }}
-      id: summarize
-      run: |
-            input="$(cat $DIFF)"
-
-            # Define the LocalAI API endpoint
-            API_URL="http://localhost:8080/chat/completions"
-
-            # Create a JSON payload using jq to handle special characters
-            json_payload=$(jq -n --arg input "$input" '{
-            model: "'$MODEL_NAME'",
-            messages: [
-                {
-                role: "system",
-                content: "You are LocalAI-bot in Github that helps understanding PRs and assess complexity. Explain what has changed in this PR diff and why"
-                },
-                {
-                role: "user",
-                content: $input
-                }
-            ]
-            }')
-
-            # Send the request to LocalAI
-            response=$(curl -s -X POST $API_URL \
-            -H "Content-Type: application/json" \
-            -d "$json_payload")
-
-            # Extract the summary from the response
-            summary="$(echo $response | jq -r '.choices[0].message.content')"
-
-            # Print the summary
-            #  -H "Authorization: Bearer $API_KEY" \
-            echo "Summary:"
-            echo "$summary"
-            echo "payload sent"
-            echo "$json_payload"
-            {
-                echo 'message<<EOF'
-                echo "$summary"
-                echo EOF
-              } >> "$GITHUB_OUTPUT"
-            docker logs --tail 10 local-ai
-    - uses: mshick/add-pr-comment@v2
-      if: always()
-      with:
-          repo-token: ${{ secrets.UPDATE_BOT_TOKEN }}
-          message: ${{ steps.summarize.outputs.message }}
-          message-failure: |
-            Uh oh! Could not analyze this PR, maybe it's too big?
--- a/.github/workflows/disabled/notify-models.yaml
+++ b/.github/workflows/disabled/notify-models.yaml
@@ -1,174 +0,0 @@
-name: Notifications for new models
-on:
-  pull_request_target:
-     types:
-       - closed
-
-permissions:
-  contents: read
-  pull-requests: read
-
-jobs:
-  notify-discord:
-    if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
-    env:
-        MODEL_NAME: gemma-3-12b-it-qat
-    runs-on: ubuntu-latest
-    steps:
-    - uses: actions/checkout@v6
-      with:
-        fetch-depth: 0 # needed to checkout all branches for this Action to work
-        ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes
-    - uses: mudler/localai-github-action@v1
-      with:
-        model: 'gemma-3-12b-it-qat' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
-        # Check the PR diff using the current branch and the base branch of the PR
-    - uses: GrantBirki/git-diff-action@v2.8.1
-      id: git-diff-action
-      with:
-            json_diff_file_output: diff.json
-            raw_diff_file_output: diff.txt
-            file_output_only: "true"
-    - name: Summarize
-      env:
-        DIFF: ${{ steps.git-diff-action.outputs.raw-diff-path }}
-      id: summarize
-      run: |
-            input="$(cat $DIFF)"
-
-            # Define the LocalAI API endpoint
-            API_URL="http://localhost:8080/chat/completions"
-
-            # Create a JSON payload using jq to handle special characters
-            json_payload=$(jq -n --arg input "$input" '{
-            model: "'$MODEL_NAME'",
-            messages: [
-                {
-                role: "system",
-                content: "You are LocalAI-bot. Write a discord message to notify everyone about the new model from the git diff. Make it informal. An example can include: the URL of the model, the name, and a brief description of the model if exists. Also add an hint on how to install it in LocalAI and that can be browsed over https://models.localai.io. For example: local-ai run model_name_here"
-                },
-                {
-                role: "user",
-                content: $input
-                }
-            ]
-            }')
-
-            # Send the request to LocalAI
-            response=$(curl -s -X POST $API_URL \
-            -H "Content-Type: application/json" \
-            -d "$json_payload")
-
-            # Extract the summary from the response
-            summary="$(echo $response | jq -r '.choices[0].message.content')"
-
-            # Print the summary
-            #  -H "Authorization: Bearer $API_KEY" \
-            echo "Summary:"
-            echo "$summary"
-            echo "payload sent"
-            echo "$json_payload"
-            {
-                echo 'message<<EOF'
-                echo "$summary"
-                echo EOF
-              } >> "$GITHUB_OUTPUT"
-            docker logs --tail 10 local-ai
-    - name: Discord notification
-      env:
-        DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK_URL }}
-        DISCORD_USERNAME: "LocalAI-Bot"
-        DISCORD_AVATAR: "https://avatars.githubusercontent.com/u/139863280?v=4"
-      uses: Ilshidur/action-discord@master
-      with:
-        args: ${{ steps.summarize.outputs.message }}
-    - name: Setup tmate session if fails
-      if: ${{ failure() }}
-      uses: mxschmitt/action-tmate@v3.23
-      with:
-        detached: true
-        connect-timeout-seconds: 180
-        limit-access-to-actor: true
-  notify-twitter:
-    if: github.repository == 'mudler/LocalAI' && (github.event.pull_request.merged == true) && (contains(github.event.pull_request.labels.*.name, 'area/ai-model'))
-    env:
-        MODEL_NAME: gemma-3-12b-it-qat
-    runs-on: ubuntu-latest
-    steps:
-    - uses: actions/checkout@v6
-      with:
-        fetch-depth: 0 # needed to checkout all branches for this Action to work
-        ref: ${{ github.event.pull_request.head.sha }} # Checkout the PR head to get the actual changes
-    - name: Start LocalAI
-      run: |
-        echo "Starting LocalAI..."
-        docker run -e -ti -d --name local-ai -p 8080:8080 localai/localai:master run --debug $MODEL_NAME
-        until [ "`docker inspect -f {{.State.Health.Status}} local-ai`" == "healthy" ]; do echo "Waiting for container to be ready";  docker logs --tail 10 local-ai; sleep 2; done
-      # Check the PR diff using the current branch and the base branch of the PR
-    - uses: GrantBirki/git-diff-action@v2.8.1
-      id: git-diff-action
-      with:
-            json_diff_file_output: diff.json
-            raw_diff_file_output: diff.txt
-            file_output_only: "true"
-    - name: Summarize
-      env:
-        DIFF: ${{ steps.git-diff-action.outputs.raw-diff-path }}
-      id: summarize
-      run: |
-            input="$(cat $DIFF)"
-
-            # Define the LocalAI API endpoint
-            API_URL="http://localhost:8080/chat/completions"
-
-            # Create a JSON payload using jq to handle special characters
-            json_payload=$(jq -n --arg input "$input" '{
-            model: "'$MODEL_NAME'",
-            messages: [
-                {
-                role: "system",
-                content: "You are LocalAI-bot. Write a twitter message to notify everyone about the new model from the git diff. Make it informal and really short. An example can include: the name, and a brief description of the model if exists. Also add an hint on how to install it in LocalAI. For example: local-ai run model_name_here"
-                },
-                {
-                role: "user",
-                content: $input
-                }
-            ]
-            }')
-
-            # Send the request to LocalAI
-            response=$(curl -s -X POST $API_URL \
-            -H "Content-Type: application/json" \
-            -d "$json_payload")
-
-            # Extract the summary from the response
-            summary="$(echo $response | jq -r '.choices[0].message.content')"
-
-            # Print the summary
-            #  -H "Authorization: Bearer $API_KEY" \
-            echo "Summary:"
-            echo "$summary"
-            echo "payload sent"
-            echo "$json_payload"
-            {
-                echo 'message<<EOF'
-                echo "$summary"
-                echo EOF
-              } >> "$GITHUB_OUTPUT"
-            docker logs --tail 10 local-ai
-    - uses: Eomm/why-don-t-you-tweet@v2
-      with:
-        tweet-message: ${{ steps.summarize.outputs.message }}
-      env:
-        # Get your tokens from https://developer.twitter.com/apps
-        TWITTER_CONSUMER_API_KEY: ${{ secrets.TWITTER_APP_KEY }}
-        TWITTER_CONSUMER_API_SECRET: ${{ secrets.TWITTER_APP_SECRET }}
-        TWITTER_ACCESS_TOKEN: ${{ secrets.TWITTER_ACCESS_TOKEN }}
-        TWITTER_ACCESS_TOKEN_SECRET: ${{ secrets.TWITTER_ACCESS_TOKEN_SECRET }}
-    - name: Setup tmate session if fails
-      if: ${{ failure() }}
-      uses: mxschmitt/action-tmate@v3.23
-      with:
-        detached: true
-        connect-timeout-seconds: 180
-        limit-access-to-actor: true
--- a/.github/workflows/disabled/prlint.yaml
+++ b/.github/workflows/disabled/prlint.yaml
@@ -1,28 +0,0 @@
-name: Check PR style
-
-on:
-  pull_request_target:
-    types:
-      - opened
-      - reopened
-      - edited
-      - synchronize
-
-jobs:
-  title-lint:
-    runs-on: ubuntu-latest
-    permissions:
-      statuses: write
-    steps:
-      - uses: aslafy-z/conventional-pr-title-action@v3
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-#  check-pr-description:
-#    runs-on: ubuntu-latest
-#    steps:
-#      - uses: actions/checkout@v2
-#      - uses: jadrol/pr-description-checker-action@v1.0.0
-#        id: description-checker
-#        with:
-#          repo-token: ${{ secrets.GITHUB_TOKEN }}
-#          exempt-labels: no qa
--- a/.github/workflows/gallery-agent.yaml
+++ b/.github/workflows/gallery-agent.yaml
@@ -1,133 +0,0 @@
-name: Gallery Agent
-on:
-
-  schedule:
-    - cron: '0 */3 * * *'  # Run every 4 hours
-  workflow_dispatch:
-    inputs:
-      search_term:
-        description: 'Search term for models'
-        required: false
-        default: 'GGUF'
-        type: string
-      limit:
-        description: 'Maximum number of models to process'
-        required: false
-        default: '15'
-        type: string
-      quantization:
-        description: 'Preferred quantization format'
-        required: false
-        default: 'Q4_K_M'
-        type: string
-      max_models:
-        description: 'Maximum number of models to add to the gallery'
-        required: false
-        default: '1'
-        type: string
-jobs:
-  gallery-agent:
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-        with:
-          token: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: '1.21'
-      - name: Proto Dependencies
-        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - uses: mudler/localai-github-action@v1.1
-        with:
-          model: 'https://huggingface.co/unsloth/Qwen3.5-2B-GGUF'
-
-      - name: Run gallery agent
-        env:
-          #OPENAI_MODEL: ${{ secrets.OPENAI_MODEL }}
-          OPENAI_MODEL: Qwen3.5-2B-GGUF
-          OPENAI_BASE_URL: "http://localhost:8080"
-          OPENAI_KEY: ${{ secrets.OPENAI_KEY }}
-          #OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
-          SEARCH_TERM: ${{ github.event.inputs.search_term || 'GGUF' }}
-          LIMIT: ${{ github.event.inputs.limit || '15' }}
-          QUANTIZATION: ${{ github.event.inputs.quantization || 'Q4_K_M' }}
-          MAX_MODELS: ${{ github.event.inputs.max_models || '1' }}
-        run: |
-          export GALLERY_INDEX_PATH=$PWD/gallery/index.yaml
-          go run ./.github/gallery-agent
-
-      - name: Check for changes
-        id: check_changes
-        run: |
-          if git diff --quiet gallery/index.yaml; then
-            echo "changes=false" >> $GITHUB_OUTPUT
-            echo "No changes detected in gallery/index.yaml"
-          else
-            echo "changes=true" >> $GITHUB_OUTPUT
-            echo "Changes detected in gallery/index.yaml"
-            git diff gallery/index.yaml
-          fi
-
-      - name: Read gallery agent summary
-        id: read_summary
-        if: steps.check_changes.outputs.changes == 'true'
-        run: |
-          if [ -f "./gallery-agent-summary.json" ]; then
-            echo "summary_exists=true" >> $GITHUB_OUTPUT
-            # Extract summary data using jq
-            echo "search_term=$(jq -r '.search_term' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
-            echo "total_found=$(jq -r '.total_found' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
-            echo "models_added=$(jq -r '.models_added' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
-            echo "quantization=$(jq -r '.quantization' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
-            echo "processing_time=$(jq -r '.processing_time' ./gallery-agent-summary.json)" >> $GITHUB_OUTPUT
-            
-            # Create a formatted list of added models with URLs
-            added_models=$(jq -r 'range(0; .added_model_ids | length) as $i | "- [\(.added_model_ids[$i])](\(.added_model_urls[$i]))"' ./gallery-agent-summary.json | tr '\n' '\n')
-            echo "added_models<<EOF" >> $GITHUB_OUTPUT
-            echo "$added_models" >> $GITHUB_OUTPUT
-            echo "EOF" >> $GITHUB_OUTPUT
-            rm -f ./gallery-agent-summary.json
-          else
-            echo "summary_exists=false" >> $GITHUB_OUTPUT
-          fi
-
-      - name: Create Pull Request
-        if: steps.check_changes.outputs.changes == 'true'
-        uses: peter-evans/create-pull-request@v8
-        with:
-          token: ${{ secrets.UPDATE_BOT_TOKEN }}
-          push-to-fork: ci-forks/LocalAI
-          commit-message: 'chore(model gallery): :robot: add new models via gallery agent'
-          title: 'chore(model gallery): :robot: add ${{ steps.read_summary.outputs.models_added || 0 }} new models via gallery agent'
-          # Branch has to be unique so PRs are not overriding each other
-          branch-suffix: timestamp
-          body: |
-            This PR was automatically created by the gallery agent workflow.
-            
-            **Summary:**
-            - **Search Term:** ${{ steps.read_summary.outputs.search_term || github.event.inputs.search_term || 'GGUF' }}
-            - **Models Found:** ${{ steps.read_summary.outputs.total_found || 'N/A' }}
-            - **Models Added:** ${{ steps.read_summary.outputs.models_added || '0' }}
-            - **Quantization:** ${{ steps.read_summary.outputs.quantization || github.event.inputs.quantization || 'Q4_K_M' }}
-            - **Processing Time:** ${{ steps.read_summary.outputs.processing_time || 'N/A' }}
-            
-            **Added Models:**
-            ${{ steps.read_summary.outputs.added_models || '- No models added' }}
-            
-            **Workflow Details:**
-            - Triggered by: `${{ github.event_name }}`
-            - Run ID: `${{ github.run_id }}`
-            - Commit: `${{ github.sha }}`
-          signoff: true
-          delete-branch: true
--- a/.github/workflows/generate_grpc_cache.yaml
+++ b/.github/workflows/generate_grpc_cache.yaml
@@ -1,96 +0,0 @@
-name: 'generate and publish GRPC docker caches'
-
-on:
-  workflow_dispatch:
-
-  schedule:
-    # daily at midnight
-    - cron: '0 0 * * *'
-
-concurrency:
-  group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  generate_caches:
-    if: github.repository == 'mudler/LocalAI'
-    strategy:
-      matrix:
-        include:
-          - grpc-base-image: ubuntu:24.04
-            runs-on: 'ubuntu-latest'
-            platforms: 'linux/amd64,linux/arm64'
-    runs-on: ${{matrix.runs-on}}
-    steps:
-      - name: Release space from worker
-        if: matrix.runs-on == 'ubuntu-latest'
-        run: |
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          df -h
-          echo
-          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
-          sudo rm -rf /usr/local/lib/android
-          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
-          sudo rm -rf /usr/share/dotnet
-          sudo apt-get remove -y '^mono-.*' || true
-          sudo apt-get remove -y '^ghc-.*' || true
-          sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
-          sudo apt-get remove -y 'php.*' || true
-          sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
-          sudo apt-get remove -y '^google-.*' || true
-          sudo apt-get remove -y azure-cli || true
-          sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
-          sudo apt-get remove -y '^gfortran-.*' || true
-          sudo apt-get remove -y microsoft-edge-stable || true
-          sudo apt-get remove -y firefox || true
-          sudo apt-get remove -y powershell || true
-          sudo apt-get remove -y r-base-core || true
-          sudo apt-get autoremove -y
-          sudo apt-get clean
-          echo
-          echo "Listing top largest packages"
-          pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
-          head -n 30 <<< "${pkgs}"
-          echo
-          sudo rm -rfv build || true
-          sudo rm -rf /usr/share/dotnet || true
-          sudo rm -rf /opt/ghc || true
-          sudo rm -rf "/usr/local/share/boost" || true
-          sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
-          df -h
-
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@master
-        with:
-          platforms: all
-
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@master
-
-      - name: Checkout
-        uses: actions/checkout@v6
-
-      - name: Cache GRPC
-        uses: docker/build-push-action@v7
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          build-args: |
-            GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
-          context: .
-          file: ./Dockerfile
-          cache-to: type=gha,ignore-error=true
-          cache-from: type=gha
-          target: grpc
-          platforms: ${{ matrix.platforms }}
-          push: false
--- a/.github/workflows/generate_intel_image.yaml
+++ b/.github/workflows/generate_intel_image.yaml
@@ -1,60 +0,0 @@
-name: 'generate and publish intel docker caches'
-
-on:
-  workflow_dispatch:
-  push:
-    branches:
-      - master
-
-concurrency:
-  group: intel-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  generate_caches:
-    if: github.repository == 'mudler/LocalAI'
-    strategy:
-      matrix:
-        include:
-          - base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
-            runs-on: 'arc-runner-set'
-            platforms: 'linux/amd64'
-    runs-on: ${{matrix.runs-on}}
-    steps:
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@master
-        with:
-          platforms: all
-      - name: Login to DockerHub
-        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v4
-        with:
-          username: ${{ secrets.DOCKERHUB_USERNAME }}
-          password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
-      - name: Login to quay
-        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v4
-        with:
-          registry: quay.io
-          username: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-          password: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@master
-
-      - name: Checkout
-        uses: actions/checkout@v6
-
-      - name: Cache Intel images
-        uses: docker/build-push-action@v7
-        with:
-          builder: ${{ steps.buildx.outputs.name }}
-          build-args: |
-            BASE_IMAGE=${{ matrix.base-image }}
-          context: .
-          file: ./Dockerfile
-          tags: quay.io/go-skynet/intel-oneapi-base:24.04
-          push: true
-          target: intel
-          platforms: ${{ matrix.platforms }}
--- a/.github/workflows/gh-pages.yml
+++ b/.github/workflows/gh-pages.yml
@@ -1,75 +0,0 @@
-name: Deploy docs to GitHub Pages
-
-on:
-  push:
-    branches:
-      - master
-    paths:
-      - 'docs/**'
-      - 'gallery/**'
-      - 'images/**'
-      - '.github/ci/modelslist.go'
-      - '.github/workflows/gh-pages.yml'
-  workflow_dispatch:
-
-permissions:
-  contents: read
-  pages: write
-  id-token: write
-
-concurrency:
-  group: pages
-  cancel-in-progress: false
-
-jobs:
-  build:
-    runs-on: ubuntu-latest
-    env:
-      HUGO_VERSION: "0.146.3"
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0  # needed for enableGitInfo
-          submodules: true
-
-      - name: Setup Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: '1.22'
-          cache: false
-
-      - name: Setup Hugo
-        uses: peaceiris/actions-hugo@v3
-        with:
-          hugo-version: ${{ env.HUGO_VERSION }}
-          extended: true
-
-      - name: Setup Pages
-        id: pages
-        uses: actions/configure-pages@v6
-
-      - name: Generate gallery
-        run: go run ./.github/ci/modelslist.go ./gallery/index.yaml > docs/static/gallery.html
-
-      - name: Build site
-        working-directory: docs
-        run: |
-          mkdir -p layouts/_default
-          hugo --minify --baseURL "${{ steps.pages.outputs.base_url }}/"
-
-      - name: Upload artifact
-        uses: actions/upload-pages-artifact@v4
-        with:
-          path: docs/public
-
-  deploy:
-    environment:
-      name: github-pages
-      url: ${{ steps.deployment.outputs.page_url }}
-    runs-on: ubuntu-latest
-    needs: build
-    steps:
-      - name: Deploy to GitHub Pages
-        id: deployment
-        uses: actions/deploy-pages@v5
--- a/.github/workflows/image-pr.yml
+++ b/.github/workflows/image-pr.yml
@@ -1,95 +1,125 @@
 ---
-  name: 'build container images tests'
-  
-  on:
-    pull_request:
-  
-  concurrency:
-    group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
-    cancel-in-progress: true
-  
-  jobs:
-    image-build:
-      uses: ./.github/workflows/image_build.yml
-      with:
-        tag-latest: ${{ matrix.tag-latest }}
-        tag-suffix: ${{ matrix.tag-suffix }}
-        build-type: ${{ matrix.build-type }}
-        cuda-major-version: ${{ matrix.cuda-major-version }}
-        cuda-minor-version: ${{ matrix.cuda-minor-version }}
-        platforms: ${{ matrix.platforms }}
-        runs-on: ${{ matrix.runs-on }}
-        base-image: ${{ matrix.base-image }}
-        grpc-base-image: ${{ matrix.grpc-base-image }}
-        makeflags: ${{ matrix.makeflags }}
-        ubuntu-version: ${{ matrix.ubuntu-version }}
-      secrets:
-        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-      strategy:
-        # Pushing with all jobs in parallel
-        # eats the bandwidth of all the nodes
-        max-parallel: ${{ github.event_name != 'pull_request' && 4 || 8 }}
-        fail-fast: false
-        matrix:
-          include:
-            - build-type: 'cublas'
-              cuda-major-version: "12"
-              cuda-minor-version: "8"
-              platforms: 'linux/amd64'
-              tag-latest: 'false'
-              tag-suffix: '-gpu-nvidia-cuda-12'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:24.04"
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-            - build-type: 'cublas'
-              cuda-major-version: "13"
-              cuda-minor-version: "0"
-              platforms: 'linux/amd64'
-              tag-latest: 'false'
-              tag-suffix: '-gpu-nvidia-cuda-13'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:22.04"
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-            - build-type: 'hipblas'
-              platforms: 'linux/amd64'
-              tag-latest: 'false'
-              tag-suffix: '-hipblas'
-              base-image: "rocm/dev-ubuntu-24.04:6.4.4"
-              grpc-base-image: "ubuntu:24.04"
-              runs-on: 'ubuntu-latest'
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-            - build-type: 'sycl'
-              platforms: 'linux/amd64'
-              tag-latest: 'false'
-              base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-              grpc-base-image: "ubuntu:24.04"
-              tag-suffix: 'sycl'
-              runs-on: 'ubuntu-latest'
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-            - build-type: 'vulkan'
-              platforms: 'linux/amd64,linux/arm64'
-              tag-latest: 'false'
-              tag-suffix: '-vulkan-core'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:24.04"
-              makeflags: "--jobs=4 --output-sync=target"
-              ubuntu-version: '2404'
-            - build-type: 'cublas'
-              cuda-major-version: "13"
-              cuda-minor-version: "0"
-              platforms: 'linux/arm64'
-              tag-latest: 'false'
-              tag-suffix: '-nvidia-l4t-arm64-cuda-13'
-              base-image: "ubuntu:24.04"
-              runs-on: 'ubuntu-24.04-arm'
-              makeflags: "--jobs=4 --output-sync=target"
-              skip-drivers: 'false'
-              ubuntu-version: '2404'
-  
+name: 'build container images tests'
+
+on:
+  pull_request:
+
+concurrency:
+  group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  extras-image-build:
+    uses: ./.github/workflows/image_build.yml
+    with:
+      tag-latest: ${{ matrix.tag-latest }}
+      tag-suffix: ${{ matrix.tag-suffix }}
+      ffmpeg: ${{ matrix.ffmpeg }}
+      image-type: ${{ matrix.image-type }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      base-image: ${{ matrix.base-image }}
+      makeflags: ${{ matrix.makeflags }}
+    secrets:
+      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      # Pushing with all jobs in parallel
+      # eats the bandwidth of all the nodes
+      max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
+      matrix:
+        include:
+          - build-type: ''
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda12-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'hipblas'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-hipblas'
+            ffmpeg: 'false'
+            image-type: 'extras'
+            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f16'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: 'sycl-f16-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+  core-image-build:
+    uses: ./.github/workflows/image_build.yml
+    with:
+      tag-latest: ${{ matrix.tag-latest }}
+      tag-suffix: ${{ matrix.tag-suffix }}
+      ffmpeg: ${{ matrix.ffmpeg }}
+      image-type: ${{ matrix.image-type }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      base-image: ${{ matrix.base-image }}
+      makeflags: ${{ matrix.makeflags }}
+    secrets:
+      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      matrix:
+        include:
+          - build-type: ''
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=5 --output-sync=target"
+          - build-type: 'sycl_f16'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: 'sycl-f16-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda12-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=5 --output-sync=target"
--- a/.github/workflows/image.yml
+++ b/.github/workflows/image.yml
@@ -1,181 +1,305 @@
 ---
-  name: 'build container images'
+name: 'build container images'
+
+on:
+  push:
+    branches:
+      - master
+    tags:
+      - '*'
+
+concurrency:
+  group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true
+
+jobs:
+  self-hosted-jobs:
+    uses: ./.github/workflows/image_build.yml
+    with:
+      tag-latest: ${{ matrix.tag-latest }}
+      tag-suffix: ${{ matrix.tag-suffix }}
+      ffmpeg: ${{ matrix.ffmpeg }}
+      image-type: ${{ matrix.image-type }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      base-image: ${{ matrix.base-image }}
+      aio: ${{ matrix.aio }}
+      makeflags: ${{ matrix.makeflags }}
+      latest-image: ${{ matrix.latest-image }}
+      latest-image-aio: ${{ matrix.latest-image-aio }}
+    secrets:
+      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      # Pushing with all jobs in parallel
+      # eats the bandwidth of all the nodes
+      max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
+      matrix:
+        include:
+          # Extra images
+          - build-type: ''
+            #platforms: 'linux/amd64,linux/arm64'
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: ''
+            ffmpeg: ''
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: ''
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda11'
+            ffmpeg: ''
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda12'
+            ffmpeg: ''
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-cublas-cuda11-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            aio: "-aio-gpu-nvidia-cuda-11"
+            latest-image: 'latest-gpu-nvidia-cuda-11'
+            latest-image-aio: 'latest-aio-gpu-nvidia-cuda-11'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-cublas-cuda12-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            base-image: "ubuntu:22.04"
+            aio: "-aio-gpu-nvidia-cuda-12"
+            latest-image: 'latest-gpu-nvidia-cuda-12'
+            latest-image-aio: 'latest-aio-gpu-nvidia-cuda-12'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: ''
+            #platforms: 'linux/amd64,linux/arm64'
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: ''
+            ffmpeg: ''
+            image-type: 'extras'
+            base-image: "ubuntu:22.04"
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'hipblas'
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-hipblas-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            aio: "-aio-gpu-hipblas"
+            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            latest-image: 'latest-gpu-hipblas'
+            latest-image-aio: 'latest-aio-gpu-hipblas'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'hipblas'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-hipblas'
+            ffmpeg: 'false'
+            image-type: 'extras'
+            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f16'
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f16-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            aio: "-aio-gpu-intel-f16"
+            latest-image: 'latest-gpu-intel-f16'
+            latest-image-aio: 'latest-aio-gpu-intel-f16'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f32'
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f32-ffmpeg'
+            ffmpeg: 'true'
+            image-type: 'extras'
+            runs-on: 'arc-runner-set'
+            aio: "-aio-gpu-intel-f32"
+            latest-image: 'latest-gpu-intel-f32'
+            latest-image-aio: 'latest-aio-gpu-intel-f32'
+            makeflags: "--jobs=3 --output-sync=target"
+          # Core images
+          - build-type: 'sycl_f16'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f16-core'
+            ffmpeg: 'false'
+            image-type: 'core'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f32'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f32-core'
+            ffmpeg: 'false'
+            image-type: 'core'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f16'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f16-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'sycl_f32'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            base-image: "intel/oneapi-basekit:2024.0.1-devel-ubuntu22.04"
+            tag-suffix: '-sycl-f32-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'hipblas'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-hipblas-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
+          - build-type: 'hipblas'
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-hipblas-core'
+            ffmpeg: 'false'
+            image-type: 'core'
+            base-image: "rocm/dev-ubuntu-22.04:6.0-complete"
+            runs-on: 'arc-runner-set'
+            makeflags: "--jobs=3 --output-sync=target"
  
-  on:
-    push:
-      branches:
-        - master
-      tags:
-        - '*'
-  
-  concurrency:
-    group: ci-${{ github.head_ref || github.ref }}-${{ github.repository }}
-    cancel-in-progress: true
-  
-  jobs:
-    hipblas-jobs:
-      if: github.repository == 'mudler/LocalAI'
-      uses: ./.github/workflows/image_build.yml
-      with:
-        tag-latest: ${{ matrix.tag-latest }}
-        tag-suffix: ${{ matrix.tag-suffix }}
-        build-type: ${{ matrix.build-type }}
-        cuda-major-version: ${{ matrix.cuda-major-version }}
-        cuda-minor-version: ${{ matrix.cuda-minor-version }}
-        platforms: ${{ matrix.platforms }}
-        runs-on: ${{ matrix.runs-on }}
-        base-image: ${{ matrix.base-image }}
-        grpc-base-image: ${{ matrix.grpc-base-image }}
-        makeflags: ${{ matrix.makeflags }}
-        ubuntu-version: ${{ matrix.ubuntu-version }}
-        ubuntu-codename: ${{ matrix.ubuntu-codename }}
-      secrets:
-        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-      strategy:
-        matrix:
-          include:
-            - build-type: 'hipblas'
-              platforms: 'linux/amd64'
-              tag-latest: 'auto'
-              tag-suffix: '-gpu-hipblas'
-              base-image: "rocm/dev-ubuntu-24.04:6.4.4"
-              grpc-base-image: "ubuntu:24.04"
-              runs-on: 'ubuntu-latest'
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-  
-    core-image-build:
-      if: github.repository == 'mudler/LocalAI'
-      uses: ./.github/workflows/image_build.yml
-      with:
-        tag-latest: ${{ matrix.tag-latest }}
-        tag-suffix: ${{ matrix.tag-suffix }}
-        build-type: ${{ matrix.build-type }}
-        cuda-major-version: ${{ matrix.cuda-major-version }}
-        cuda-minor-version: ${{ matrix.cuda-minor-version }}
-        platforms: ${{ matrix.platforms }}
-        runs-on: ${{ matrix.runs-on }}
-        base-image: ${{ matrix.base-image }}
-        grpc-base-image: ${{ matrix.grpc-base-image }}
-        makeflags: ${{ matrix.makeflags }}
-        skip-drivers: ${{ matrix.skip-drivers }}
-        ubuntu-version: ${{ matrix.ubuntu-version }}
-        ubuntu-codename: ${{ matrix.ubuntu-codename }}
-      secrets:
-        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-      strategy:
-        #max-parallel: ${{ github.event_name != 'pull_request' && 2 || 4 }}
-        matrix:
-          include:
-            - build-type: ''
-              platforms: 'linux/amd64,linux/arm64'
-              tag-latest: 'auto'
-              tag-suffix: ''
-              base-image: "ubuntu:24.04"
-              runs-on: 'ubuntu-latest'
-              makeflags: "--jobs=4 --output-sync=target"
-              skip-drivers: 'false'
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-            - build-type: 'cublas'
-              cuda-major-version: "12"
-              cuda-minor-version: "8"
-              platforms: 'linux/amd64'
-              tag-latest: 'auto'
-              tag-suffix: '-gpu-nvidia-cuda-12'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:24.04"
-              skip-drivers: 'false'
-              makeflags: "--jobs=4 --output-sync=target"
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-            - build-type: 'cublas'
-              cuda-major-version: "13"
-              cuda-minor-version: "0"
-              platforms: 'linux/amd64'
-              tag-latest: 'auto'
-              tag-suffix: '-gpu-nvidia-cuda-13'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:22.04"
-              skip-drivers: 'false'
-              makeflags: "--jobs=4 --output-sync=target"
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-            - build-type: 'vulkan'
-              platforms: 'linux/amd64,linux/arm64'
-              tag-latest: 'auto'
-              tag-suffix: '-gpu-vulkan'
-              runs-on: 'ubuntu-latest'
-              base-image: "ubuntu:24.04"
-              skip-drivers: 'false'
-              makeflags: "--jobs=4 --output-sync=target"
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-            - build-type: 'intel'
-              platforms: 'linux/amd64'
-              tag-latest: 'auto'
-              base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
-              grpc-base-image: "ubuntu:24.04"
-              tag-suffix: '-gpu-intel'
-              runs-on: 'ubuntu-latest'
-              makeflags: "--jobs=3 --output-sync=target"
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-  
-    gh-runner:
-      if: github.repository == 'mudler/LocalAI'
-      uses: ./.github/workflows/image_build.yml
-      with:
-        tag-latest: ${{ matrix.tag-latest }}
-        tag-suffix: ${{ matrix.tag-suffix }}
-        build-type: ${{ matrix.build-type }}
-        cuda-major-version: ${{ matrix.cuda-major-version }}
-        cuda-minor-version: ${{ matrix.cuda-minor-version }}
-        platforms: ${{ matrix.platforms }}
-        runs-on: ${{ matrix.runs-on }}
-        base-image: ${{ matrix.base-image }}
-        grpc-base-image: ${{ matrix.grpc-base-image }}
-        makeflags: ${{ matrix.makeflags }}
-        skip-drivers: ${{ matrix.skip-drivers }}
-        ubuntu-version: ${{ matrix.ubuntu-version }}
-        ubuntu-codename: ${{ matrix.ubuntu-codename }}
-      secrets:
-        dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
-        dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
-        quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
-        quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
-      strategy:
-        matrix:
-          include:
-            - build-type: 'cublas'
-              cuda-major-version: "12"
-              cuda-minor-version: "0"
-              platforms: 'linux/arm64'
-              tag-latest: 'auto'
-              tag-suffix: '-nvidia-l4t-arm64'
-              base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
-              runs-on: 'ubuntu-24.04-arm'
-              makeflags: "--jobs=4 --output-sync=target"
-              skip-drivers: 'true'
-              ubuntu-version: "2204"
-              ubuntu-codename: 'jammy'
-            - build-type: 'cublas'
-              cuda-major-version: "13"
-              cuda-minor-version: "0"
-              platforms: 'linux/arm64'
-              tag-latest: 'auto'
-              tag-suffix: '-nvidia-l4t-arm64-cuda-13'
-              base-image: "ubuntu:24.04"
-              runs-on: 'ubuntu-24.04-arm'
-              makeflags: "--jobs=4 --output-sync=target"
-              skip-drivers: 'false'
-              ubuntu-version: '2404'
-              ubuntu-codename: 'noble'
-  
+  core-image-build:
+    uses: ./.github/workflows/image_build.yml
+    with:
+      tag-latest: ${{ matrix.tag-latest }}
+      tag-suffix: ${{ matrix.tag-suffix }}
+      ffmpeg: ${{ matrix.ffmpeg }}
+      image-type: ${{ matrix.image-type }}
+      build-type: ${{ matrix.build-type }}
+      cuda-major-version: ${{ matrix.cuda-major-version }}
+      cuda-minor-version: ${{ matrix.cuda-minor-version }}
+      platforms: ${{ matrix.platforms }}
+      runs-on: ${{ matrix.runs-on }}
+      aio: ${{ matrix.aio }}
+      base-image: ${{ matrix.base-image }}
+      makeflags: ${{ matrix.makeflags }}
+      latest-image: ${{ matrix.latest-image }}
+      latest-image-aio: ${{ matrix.latest-image-aio }}
+    secrets:
+      dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
+      dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
+      quayUsername: ${{ secrets.LOCALAI_REGISTRY_USERNAME }}
+      quayPassword: ${{ secrets.LOCALAI_REGISTRY_PASSWORD }}
+    strategy:
+      matrix:
+        include:
+          - build-type: ''
+            platforms: 'linux/amd64'
+            tag-latest: 'auto'
+            tag-suffix: '-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            base-image: "ubuntu:22.04"
+            runs-on: 'ubuntu-latest'
+            aio: "-aio-cpu"
+            latest-image: 'latest-cpu'
+            latest-image-aio: 'latest-aio-cpu'
+            makeflags: "--jobs=5 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda11-core'
+            ffmpeg: ''
+            image-type: 'core'
+            base-image: "ubuntu:22.04"
+            runs-on: 'ubuntu-latest'
+            makeflags: "--jobs=5 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda12-core'
+            ffmpeg: ''
+            image-type: 'core'
+            base-image: "ubuntu:22.04"
+            runs-on: 'ubuntu-latest'
+            makeflags: "--jobs=5 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "11"
+            cuda-minor-version: "7"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda11-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=5 --output-sync=target"
+          - build-type: 'cublas'
+            cuda-major-version: "12"
+            cuda-minor-version: "1"
+            platforms: 'linux/amd64'
+            tag-latest: 'false'
+            tag-suffix: '-cublas-cuda12-ffmpeg-core'
+            ffmpeg: 'true'
+            image-type: 'core'
+            runs-on: 'ubuntu-latest'
+            base-image: "ubuntu:22.04"
+            makeflags: "--jobs=5 --output-sync=target"
--- a/.github/workflows/image_build.yml
+++ b/.github/workflows/image_build.yml
@@ -6,10 +6,6 @@ on:
    inputs:
      base-image:
        description: 'Base image'
-        required: true
-        type: string
-      grpc-base-image:
-        description: 'GRPC Base image, must be a compatible image with base-image'
        required: false
        default: ''
        type: string
@@ -19,11 +15,11 @@ on:
        type: string
      cuda-major-version:
        description: 'CUDA major version'
-        default: "12"
+        default: "11"
        type: string
      cuda-minor-version:
        description: 'CUDA minor version'
-        default: "9"
+        default: "7"
        type: string
      platforms:
        description: 'Platforms'
@@ -33,13 +29,25 @@ on:
        description: 'Tag latest'
        default: ''
        type: string
+      latest-image:
+          description: 'Tag latest'
+          default: ''
+          type: string
+      latest-image-aio:
+          description: 'Tag latest'
+          default: ''
+          type: string
      tag-suffix:
        description: 'Tag suffix'
        default: ''
        type: string
-      skip-drivers:
-        description: 'Skip drivers by default'
-        default: 'false'
+      ffmpeg:
+        description: 'FFMPEG'
+        default: ''
+        type: string
+      image-type:
+        description: 'Image type'
+        default: ''
        type: string
      runs-on:
        description: 'Runs on'
@@ -49,17 +57,12 @@ on:
      makeflags:
        description: 'Make Flags'
        required: false
-        default: '--jobs=4 --output-sync=target'
+        default: '--jobs=3 --output-sync=target'
        type: string
-      ubuntu-version:
-        description: 'Ubuntu version'
+      aio:
+        description: 'AIO Image Name'
        required: false
-        default: '2204'
-        type: string
-      ubuntu-codename:
-        description: 'Ubuntu codename'
-        required: false
-        default: 'noble'
+        default: ''
        type: string
    secrets:
      dockerUsername:
@@ -74,22 +77,6 @@ jobs:
  reusable_image-build:
    runs-on: ${{ inputs.runs-on }}
    steps:
-
-      - name: Free Disk Space (Ubuntu)
-        if: inputs.runs-on == 'ubuntu-latest'
-        uses: jlumbroso/free-disk-space@main
-        with:
-          # this might remove tools that are actually needed,
-          # if set to "true" but frees about 6 GB
-          tool-cache: true
-          # all of these default to true, but feel free to set to
-          # "false" if necessary for your workflow
-          android: true
-          dotnet: true
-          haskell: true
-          large-packages: true
-          docker-images: true
-          swap-storage: true
      - name: Force Install GIT latest
        run: |
          sudo apt-get update \
@@ -99,7 +86,7 @@ jobs:
          && sudo apt-get update \
          && sudo apt-get install -y git
      - name: Checkout
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4

      - name: Release space from worker
        if: inputs.runs-on == 'ubuntu-latest'
@@ -111,8 +98,8 @@ jobs:
          df -h
          echo
          sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
-          sudo apt-get remove --auto-remove android-sdk-platform-tools snapd || true
-          sudo apt-get purge --auto-remove android-sdk-platform-tools snapd || true
+          sudo apt-get remove --auto-remove android-sdk-platform-tools || true
+          sudo apt-get purge --auto-remove android-sdk-platform-tools || true
          sudo rm -rf /usr/local/lib/android
          sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
          sudo rm -rf /usr/share/dotnet
@@ -145,8 +132,7 @@ jobs:

      - name: Docker meta
        id: meta
-        if: github.event_name != 'pull_request'
-        uses: docker/metadata-action@v6
+        uses: docker/metadata-action@v5
        with:
          images: |
            quay.io/go-skynet/local-ai
@@ -157,21 +143,36 @@ jobs:
            type=sha
          flavor: |
            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.tag-suffix }},onlatest=true
-      - name: Docker meta for PR
-        id: meta_pull_request
-        if: github.event_name == 'pull_request'
-        uses: docker/metadata-action@v6
+            suffix=${{ inputs.tag-suffix }}
+
+      - name: Docker meta AIO (quay.io)
+        if: inputs.aio != ''
+        id: meta_aio
+        uses: docker/metadata-action@v5
        with:
          images: |
-            quay.io/go-skynet/ci-tests
+            quay.io/go-skynet/local-ai
          tags: |
-            type=ref,event=branch,suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
-            type=semver,pattern={{raw}},suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
-            type=sha,suffix=localai${{ github.event.number }}-${{ inputs.build-type }}-${{ inputs.cuda-major-version }}-${{ inputs.cuda-minor-version }}
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
          flavor: |
            latest=${{ inputs.tag-latest }}
-            suffix=${{ inputs.tag-suffix }}
+            suffix=${{ inputs.aio }}
+
+      - name: Docker meta AIO (dockerhub)
+        if: inputs.aio != ''
+        id: meta_aio_dockerhub
+        uses: docker/metadata-action@v5
+        with:
+          images: |
+            localai/localai
+          tags: |
+            type=ref,event=branch
+            type=semver,pattern={{raw}}
+          flavor: |
+            latest=${{ inputs.tag-latest }}
+            suffix=${{ inputs.aio }}
+
      - name: Set up QEMU
        uses: docker/setup-qemu-action@master
        with:
@@ -183,40 +184,50 @@ jobs:

      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v4
+        uses: docker/login-action@v3
        with:
          username: ${{ secrets.dockerUsername }}
          password: ${{ secrets.dockerPassword }}

      - name: Login to DockerHub
        if: github.event_name != 'pull_request'
-        uses: docker/login-action@v4
+        uses: docker/login-action@v3
        with:
          registry: quay.io
          username: ${{ secrets.quayUsername }}
          password: ${{ secrets.quayPassword }}

-      - name: Build and push
-        uses: docker/build-push-action@v7
-        if: github.event_name != 'pull_request'
+      - name: Cache GRPC
+        uses: docker/build-push-action@v5
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          build-args: |
+            IMAGE_TYPE=${{ inputs.image-type }}
+            BASE_IMAGE=${{ inputs.base-image }}
+            MAKEFLAGS=${{ inputs.makeflags }}
+            GRPC_VERSION=v1.58.0
+          context: .
+          file: ./Dockerfile
+          cache-from: type=gha
+          cache-to: type=gha,ignore-error=true
+          target: grpc
+          platforms: ${{ inputs.platforms }}
+          push: false
+          tags: ${{ steps.meta.outputs.tags }}
+          labels: ${{ steps.meta.outputs.labels }}
+
+      - name: Build and push
+        uses: docker/build-push-action@v5
        with:
          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          # This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
          build-args: |
            BUILD_TYPE=${{ inputs.build-type }}
            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
+            FFMPEG=${{ inputs.ffmpeg }}
+            IMAGE_TYPE=${{ inputs.image-type }}
            BASE_IMAGE=${{ inputs.base-image }}
-            GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
            MAKEFLAGS=${{ inputs.makeflags }}
-            SKIP_DRIVERS=${{ inputs.skip-drivers }}
-            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-            UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
          context: .
          file: ./Dockerfile
          cache-from: type=gha
@@ -224,36 +235,71 @@ jobs:
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
-### Start testing image
-      - name: Build and push
-        uses: docker/build-push-action@v7
-        if: github.event_name == 'pull_request'
+
+      - name: Inspect image
+        if: github.event_name != 'pull_request'
+        run: |
+          docker pull localai/localai:${{ steps.meta.outputs.version }}
+          docker image inspect localai/localai:${{ steps.meta.outputs.version }}
+          docker pull quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
+          docker image inspect quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
+
+      - name: Build and push AIO image
+        if: inputs.aio != ''
+        uses: docker/build-push-action@v5
        with:
          builder: ${{ steps.buildx.outputs.name }}
-          # The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
-          # This means that even the MAKEFLAGS have to be an EXACT match.
-          # If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
-          # This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
          build-args: |
-            BUILD_TYPE=${{ inputs.build-type }}
-            CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
-            CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
-            BASE_IMAGE=${{ inputs.base-image }}
-            GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
-            GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
-            GRPC_VERSION=v1.65.0
+            BASE_IMAGE=quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
            MAKEFLAGS=${{ inputs.makeflags }}
-            SKIP_DRIVERS=${{ inputs.skip-drivers }}
-            UBUNTU_VERSION=${{ inputs.ubuntu-version }}
-            UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
          context: .
-          file: ./Dockerfile
-          cache-from: type=gha
+          file: ./Dockerfile.aio
          platforms: ${{ inputs.platforms }}
-          #push: true
-          tags: ${{ steps.meta_pull_request.outputs.tags }}
-          labels: ${{ steps.meta_pull_request.outputs.labels }}
-## End testing image
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.meta_aio.outputs.tags }}
+          labels: ${{ steps.meta_aio.outputs.labels }}
+
+      - name: Build and push AIO image (dockerhub)
+        if: inputs.aio != ''
+        uses: docker/build-push-action@v5
+        with:
+          builder: ${{ steps.buildx.outputs.name }}
+          build-args: |
+            BASE_IMAGE=localai/localai:${{ steps.meta.outputs.version }}
+            MAKEFLAGS=${{ inputs.makeflags }}
+          context: .
+          file: ./Dockerfile.aio
+          platforms: ${{ inputs.platforms }}
+          push: ${{ github.event_name != 'pull_request' }}
+          tags: ${{ steps.meta_aio_dockerhub.outputs.tags }}
+          labels: ${{ steps.meta_aio_dockerhub.outputs.labels }}
+
+      - name: Latest tag
+        # run this on branches, when it is a tag and there is a latest-image defined
+        if: github.event_name != 'pull_request' && inputs.latest-image != ''  && github.ref_type == 'tag'
+        run: |
+          docker pull localai/localai:${{ steps.meta.outputs.version }}
+          docker tag localai/localai:${{ steps.meta.outputs.version }} localai/localai:${{ inputs.latest-image }}
+          docker push localai/localai:${{ inputs.latest-image }}
+          docker pull quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }}
+          docker tag quay.io/go-skynet/local-ai:${{ steps.meta.outputs.version }} quay.io/go-skynet/local-ai:${{ inputs.latest-image }}
+          docker push quay.io/go-skynet/local-ai:${{ inputs.latest-image }}
+      - name: Latest AIO tag
+        # run this on branches, when it is a tag and there is a latest-image defined
+        if: github.event_name != 'pull_request' && inputs.latest-image-aio != ''  && github.ref_type == 'tag'
+        run: |
+          docker pull localai/localai:${{ steps.meta_aio_dockerhub.outputs.version }}
+          docker tag localai/localai:${{ steps.meta_aio_dockerhub.outputs.version }} localai/localai:${{ inputs.latest-image-aio }}
+          docker push localai/localai:${{ inputs.latest-image-aio }}
+          docker pull quay.io/go-skynet/local-ai:${{ steps.meta_aio.outputs.version }}
+          docker tag quay.io/go-skynet/local-ai:${{ steps.meta_aio.outputs.version }} quay.io/go-skynet/local-ai:${{ inputs.latest-image-aio }}
+          docker push quay.io/go-skynet/local-ai:${{ inputs.latest-image-aio }}
+  
      - name: job summary
        run: |
          echo "Built image: ${{ steps.meta.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
+
+      - name: job summary(AIO)
+        if: inputs.aio != ''
+        run: |
+          echo "Built image: ${{ steps.meta_aio.outputs.labels }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/disabled/labeler.yml
+++ b/.github/workflows/disabled/labeler.yml
@@ -9,4 +9,4 @@ jobs:
      pull-requests: write
    runs-on: ubuntu-latest
    steps:
-    - uses: actions/labeler@v6
+    - uses: actions/labeler@v5
--- a/.github/workflows/disabled/localaibot_automerge.yml
+++ b/.github/workflows/disabled/localaibot_automerge.yml
@@ -6,15 +6,14 @@ permissions:
  contents: write
  pull-requests: write
  packages: read
-  issues: write # for Homebrew/actions/post-comment
-  actions: write # to dispatch publish workflow
+
 jobs:
  dependabot:
-    if: github.repository == 'mudler/LocalAI' && github.actor == 'localai-bot' && contains(github.event.pull_request.title, 'chore:')
    runs-on: ubuntu-latest
+    if: ${{ github.actor == 'localai-bot' }}
    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4

      - name: Approve a PR if not already approved
        run: |
--- a/.github/workflows/notify-releases.yaml
+++ b/.github/workflows/notify-releases.yaml
@@ -1,65 +0,0 @@
-name: Release notifications
-on:
-  release:
-    types:
-      - published
-
-jobs:
-  notify-discord:
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    env:
-        RELEASE_BODY: ${{ github.event.release.body }}
-        RELEASE_TITLE: ${{ github.event.release.name }}
-        RELEASE_TAG_NAME: ${{ github.event.release.tag_name }}
-        MODEL_NAME: gemma-3-12b-it-qat
-    steps:
-    - uses: mudler/localai-github-action@v1
-      with:
-        model: 'gemma-3-12b-it-qat' # Any from models.localai.io, or from huggingface.com with: "huggingface://<repository>/file"
-    - name: Summarize
-      id: summarize
-      run: |
-            input="$RELEASE_TITLE\b$RELEASE_BODY"
-
-            # Define the LocalAI API endpoint
-            API_URL="http://localhost:8080/chat/completions"
-
-            # Create a JSON payload using jq to handle special characters
-            json_payload=$(jq -n --arg input "$input" '{
-            model: "'$MODEL_NAME'",
-            messages: [
-                {
-                role: "system",
-                content: "Write a discord message with a bullet point summary of the release notes."
-                },
-                {
-                role: "user",
-                content: $input
-                }
-            ]
-            }')
-
-            # Send the request to LocalAI API
-            response=$(curl -s -X POST $API_URL \
-            -H "Content-Type: application/json" \
-            -d "$json_payload")
-
-            # Extract the summary from the response
-            summary=$(echo $response | jq -r '.choices[0].message.content')
-
-            # Print the summary
-            #  -H "Authorization: Bearer $API_KEY" \
-            {
-                echo 'message<<EOF'
-                echo "$summary"
-                echo EOF
-              } >> "$GITHUB_OUTPUT"
-    - name: Discord notification
-      env:
-        DISCORD_WEBHOOK: ${{ secrets.DISCORD_WEBHOOK_URL_RELEASE }}
-        DISCORD_USERNAME: "LocalAI-Bot"
-        DISCORD_AVATAR: "https://avatars.githubusercontent.com/u/139863280?v=4"
-      uses: Ilshidur/action-discord@master
-      with:
-        args: ${{ steps.summarize.outputs.message }}
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -1,64 +1,218 @@
-name: goreleaser
+name: Build and Release

-on:
-  push:
-    tags:
-      - 'v*'
+on: 
+- push
+- pull_request
+
+env:
+  GRPC_VERSION: v1.58.0
+
+permissions:
+  contents: write
+
+concurrency:
+  group: ci-releases-${{ github.head_ref || github.ref }}-${{ github.repository }}
+  cancel-in-progress: true

 jobs:
-  goreleaser:
+  build-linux:
+    strategy:
+      matrix:
+        include:
+          - build: 'avx2'
+            defines: ''
+          - build: 'avx'
+            defines: '-DLLAMA_AVX2=OFF'
+          - build: 'avx512'
+            defines: '-DLLAMA_AVX512=ON'
+          - build: 'cuda12'
+            defines: ''
+          - build: 'cuda11'
+            defines: ''
    runs-on: ubuntu-latest
    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
+      - name: Clone
+        uses: actions/checkout@v4
        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
+          submodules: true
+      - uses: actions/setup-go@v5
        with:
-          go-version: 1.23
-      - name: Run GoReleaser
-        uses: goreleaser/goreleaser-action@v7
-        with:
-          version: v2.11.0
-          args: release --clean
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-  launcher-build-darwin:
-    runs-on: macos-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: 1.23
-      - name: Build launcher for macOS ARM64
-        run: |
-          make build-launcher-darwin
-      - name: Upload DMG to Release
-        uses: softprops/action-gh-release@v2
-        with:
-          files: ./dist/LocalAI.dmg
-  launcher-build-linux:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 0
-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: 1.23
-      - name: Build launcher for Linux
+          go-version: '1.21.x'
+          cache: false
+      - name: Dependencies
        run: |
          sudo apt-get update
-          sudo apt-get install golang gcc libgl1-mesa-dev xorg-dev libxkbcommon-dev
-          make build-launcher-linux
-      - name: Upload Linux launcher artifacts
-        uses: softprops/action-gh-release@v2
+          sudo apt-get install build-essential ffmpeg protobuf-compiler
+      - name: Install CUDA Dependencies
+        if: ${{ matrix.build == 'cuda12' || matrix.build == 'cuda11' }}
+        run: |
+          if [ "${{ matrix.build }}" == "cuda12" ]; then
+            export CUDA_VERSION=12-3
+          else
+            export CUDA_VERSION=11-7
+          fi
+          curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
+          sudo dpkg -i cuda-keyring_1.1-1_all.deb
+          sudo apt-get update
+          sudo apt-get install -y cuda-nvcc-${CUDA_VERSION} libcublas-dev-${CUDA_VERSION}
+      - name: Cache grpc
+        id: cache-grpc
+        uses: actions/cache@v4
        with:
-          files: ./local-ai-launcher-linux.tar.xz
+          path: grpc
+          key: ${{ runner.os }}-grpc-${{ env.GRPC_VERSION }}
+      - name: Build grpc
+        if: steps.cache-grpc.outputs.cache-hit != 'true'
+        run: |
+          git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
+          cd grpc && mkdir -p cmake/build && cd cmake/build && cmake -DgRPC_INSTALL=ON \
+            -DgRPC_BUILD_TESTS=OFF \
+            ../.. && sudo make --jobs 5 --output-sync=target
+      - name: Install gRPC
+        run: |
+          cd grpc && cd cmake/build && sudo make --jobs 5 --output-sync=target install
+      - name: Build
+        id: build
+        env:
+          CMAKE_ARGS: "${{ matrix.defines }}"
+          BUILD_ID: "${{ matrix.build }}"
+        run: |
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
+          export PATH=$PATH:$GOPATH/bin
+          if [ "${{ matrix.build }}" == "cuda12" ] || [ "${{ matrix.build }}" == "cuda11" ]; then
+            export BUILD_TYPE=cublas
+            export PATH=/usr/local/cuda/bin:$PATH
+            make dist
+          else
+            STATIC=true make dist
+          fi
+      - uses: actions/upload-artifact@v4
+        with:
+          name: LocalAI-linux-${{ matrix.build }}
+          path: release/
+      - name: Release
+        uses: softprops/action-gh-release@v2
+        if: startsWith(github.ref, 'refs/tags/')
+        with:
+          files: |
+            release/*
+
+  build-stablediffusion:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v4
+        with:
+          submodules: true
+      - uses: actions/setup-go@v5
+        with:
+          go-version: '1.21.x'
+          cache: false
+      - name: Dependencies
+        run: |
+          sudo apt-get install -y --no-install-recommends libopencv-dev protobuf-compiler
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
+      - name: Build stablediffusion
+        run: |
+          export PATH=$PATH:$GOPATH/bin
+          make backend-assets/grpc/stablediffusion
+          mkdir -p release && cp backend-assets/grpc/stablediffusion release
+      - uses: actions/upload-artifact@v4
+        with:
+          name: stablediffusion
+          path: release/
+
+  build-macOS:
+    strategy:
+      matrix:
+        include:
+          - build: 'avx2'
+            defines: ''
+          - build: 'avx'
+            defines: '-DLLAMA_AVX2=OFF'
+          - build: 'avx512'
+            defines: '-DLLAMA_AVX512=ON'
+    runs-on: macOS-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v4
+        with:
+          submodules: true
+      - uses: actions/setup-go@v5
+        with:
+          go-version: '1.21.x'
+          cache: false
+      - name: Dependencies
+        run: |
+          brew install protobuf grpc
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
+      - name: Build
+        id: build
+        env:
+          CMAKE_ARGS: "${{ matrix.defines }}"
+          BUILD_ID: "${{ matrix.build }}"
+        run: |
+          export C_INCLUDE_PATH=/usr/local/include
+          export CPLUS_INCLUDE_PATH=/usr/local/include
+          export PATH=$PATH:$GOPATH/bin
+          make dist
+      - uses: actions/upload-artifact@v4
+        with:
+          name: LocalAI-MacOS-${{ matrix.build }}
+          path: release/
+      - name: Release
+        uses: softprops/action-gh-release@v2
+        if: startsWith(github.ref, 'refs/tags/')
+        with:
+          files: |
+            release/*
+
+
+  build-macOS-arm64:
+    strategy:
+      matrix:
+        include:
+          - build: 'avx2'
+            defines: ''
+          - build: 'avx'
+            defines: '-DLLAMA_AVX2=OFF'
+          - build: 'avx512'
+            defines: '-DLLAMA_AVX512=ON'
+    runs-on: macos-14
+    steps:
+      - name: Clone
+        uses: actions/checkout@v4
+        with:
+          submodules: true
+      - uses: actions/setup-go@v5
+        with:
+          go-version: '1.21.x'
+          cache: false
+      - name: Dependencies
+        run: |
+          brew install protobuf grpc
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
+      - name: Build
+        id: build
+        env:
+          CMAKE_ARGS: "${{ matrix.defines }}"
+          BUILD_ID: "${{ matrix.build }}"
+        run: |
+          export C_INCLUDE_PATH=/usr/local/include
+          export CPLUS_INCLUDE_PATH=/usr/local/include
+          export PATH=$PATH:$GOPATH/bin
+          make dist
+      - uses: actions/upload-artifact@v4
+        with:
+          name: LocalAI-MacOS-arm64-${{ matrix.build }}
+          path: release/
+      - name: Release
+        uses: softprops/action-gh-release@v2
+        if: startsWith(github.ref, 'refs/tags/')
+        with:
+          files: |
+            release/*
--- a/.github/workflows/secscan.yaml
+++ b/.github/workflows/secscan.yaml
@@ -14,17 +14,17 @@ jobs:
      GO111MODULE: on
    steps:
      - name: Checkout Source
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        if: ${{ github.actor != 'dependabot[bot]' }}
      - name: Run Gosec Security Scanner
        if: ${{ github.actor != 'dependabot[bot]' }}
-        uses: securego/gosec@v2.22.9
+        uses: securego/gosec@master
        with:
          # we let the report trigger content trigger a failure using the GitHub Security features.
          args: '-no-fail -fmt sarif -out results.sarif ./...'
      - name: Upload SARIF file
        if: ${{ github.actor != 'dependabot[bot]' }}
-        uses: github/codeql-action/upload-sarif@v4
+        uses: github/codeql-action/upload-sarif@v3
        with:
          # Path to SARIF file relative to the root of the repository
          sarif_file: results.sarif
--- a/.github/workflows/stalebot.yml
+++ b/.github/workflows/stalebot.yml
@@ -1,25 +0,0 @@
-name: 'Close stale issues and PRs'
-permissions:
-  issues: write
-  pull-requests: write
-on:
-  schedule:
-    - cron: '30 1 * * *'
-
-jobs:
-  stale:
-    if: github.repository == 'mudler/LocalAI'
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/stale@b5d41d4e1d5dceea10e7104786b73624c18a190f # v9
-        with:
-          stale-issue-message: 'This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.'
-          stale-pr-message: 'This PR is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.'
-          close-issue-message: 'This issue was closed because it has been stalled for 5 days with no activity.'
-          close-pr-message: 'This PR was closed because it has been stalled for 10 days with no activity.'
-          days-before-issue-stale: 90
-          days-before-pr-stale: 90
-          days-before-issue-close: 5
-          days-before-pr-close: 10
-          exempt-issue-labels: 'roadmap'
-          exempt-pr-labels: 'roadmap'
--- a/.github/workflows/test-extra.yml
+++ b/.github/workflows/test-extra.yml
@@ -14,170 +14,189 @@ concurrency:
  cancel-in-progress: true

 jobs:
-  detect-changes:
-    runs-on: ubuntu-latest
-    outputs:
-      run-all: ${{ steps.detect.outputs.run-all }}
-      transformers: ${{ steps.detect.outputs.transformers }}
-      rerankers: ${{ steps.detect.outputs.rerankers }}
-      diffusers: ${{ steps.detect.outputs.diffusers }}
-      coqui: ${{ steps.detect.outputs.coqui }}
-      moonshine: ${{ steps.detect.outputs.moonshine }}
-      pocket-tts: ${{ steps.detect.outputs.pocket-tts }}
-      qwen-tts: ${{ steps.detect.outputs.qwen-tts }}
-      qwen-asr: ${{ steps.detect.outputs.qwen-asr }}
-      nemo: ${{ steps.detect.outputs.nemo }}
-      voxcpm: ${{ steps.detect.outputs.voxcpm }}
-      llama-cpp-quantization: ${{ steps.detect.outputs.llama-cpp-quantization }}
-      acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
-      voxtral: ${{ steps.detect.outputs.voxtral }}
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-      - name: Setup Bun
-        uses: oven-sh/setup-bun@v2
-      - name: Install dependencies
-        run: bun add js-yaml @octokit/core
-      - name: Detect changed backends
-        id: detect
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          GITHUB_EVENT_PATH: ${{ github.event_path }}
-        run: bun run scripts/changed-backends.js
-
-  # Requires CUDA
-  # tests-chatterbox-tts:
-  #   runs-on: ubuntu-latest
-  #   steps:
-  #     - name: Clone
-  #       uses: actions/checkout@v6
-  #       with:
-  #         submodules: true
-  #     - name: Dependencies
-  #       run: |
-  #         sudo apt-get update
-  #         sudo apt-get install build-essential ffmpeg
-  #         # Install UV
-  #         curl -LsSf https://astral.sh/uv/install.sh | sh
-  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-  #         sudo apt-get install -y libopencv-dev
-  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
-
-  #     - name: Test chatterbox-tts
-  #       run: |
-  #          make --jobs=5 --output-sync=target -C backend/python/chatterbox
-  #          make --jobs=5 --output-sync=target -C backend/python/chatterbox test
  tests-transformers:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.transformers == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
          sudo apt-get install build-essential ffmpeg
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
          sudo apt-get install -y libopencv-dev
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
+          pip install --user grpcio-tools
+          
+          sudo rm -rfv /usr/bin/conda || true

      - name: Test transformers
        run: |
+           export PATH=$PATH:/opt/conda/bin
           make --jobs=5 --output-sync=target -C backend/python/transformers
           make --jobs=5 --output-sync=target -C backend/python/transformers test
-  tests-rerankers:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.rerankers == 'true' || needs.detect-changes.outputs.run-all == 'true'
+
+  tests-sentencetransformers:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
          sudo apt-get install build-essential ffmpeg
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
          sudo apt-get install -y libopencv-dev
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
+          pip install --user grpcio-tools
+          
+          sudo rm -rfv /usr/bin/conda || true

-      - name: Test rerankers
+      - name: Test sentencetransformers
        run: |
-           make --jobs=5 --output-sync=target -C backend/python/rerankers
-           make --jobs=5 --output-sync=target -C backend/python/rerankers test
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/sentencetransformers
+           make --jobs=5 --output-sync=target -C backend/python/sentencetransformers test

  tests-diffusers:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.diffusers == 'true' || needs.detect-changes.outputs.run-all == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg
+          sudo apt-get install build-essential ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
          sudo apt-get install -y libopencv-dev
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
+          pip install --user grpcio-tools
+          
+          sudo rm -rfv /usr/bin/conda || true
+
      - name: Test diffusers
        run: |
-          make --jobs=5 --output-sync=target -C backend/python/diffusers
-          make --jobs=5 --output-sync=target -C backend/python/diffusers test
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/diffusers
+           make --jobs=5 --output-sync=target -C backend/python/diffusers test

-  #tests-vllm:
-  #  runs-on: ubuntu-latest
-  #  steps:
-  #    - name: Clone
-  #      uses: actions/checkout@v6
-  #      with:
-  #        submodules: true
-  #    - name: Dependencies
-  #      run: |
-  #        sudo apt-get update
-  #        sudo apt-get install -y build-essential ffmpeg
-  #        sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-  #        sudo apt-get install -y libopencv-dev
-  #        # Install UV
-  #        curl -LsSf https://astral.sh/uv/install.sh | sh
-  #        pip install --user --no-cache-dir grpcio-tools==1.64.1
-  #    - name: Test vllm backend
-  #      run: |
-  #        make --jobs=5 --output-sync=target -C backend/python/vllm
-  #        make --jobs=5 --output-sync=target -C backend/python/vllm test
-  # tests-transformers-musicgen:
+  tests-parler-tts:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v4
+        with: 
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install build-essential ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          sudo apt-get install -y libopencv-dev
+          pip install --user grpcio-tools
+          
+          sudo rm -rfv /usr/bin/conda || true
+
+      - name: Test parler-tts
+        run: |
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/parler-tts
+           make --jobs=5 --output-sync=target -C backend/python/parler-tts test
+
+  tests-transformers-musicgen:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clone
+        uses: actions/checkout@v4
+        with: 
+          submodules: true
+      - name: Dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install build-essential ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
+          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
+          sudo apt-get install -y libopencv-dev
+          pip install --user grpcio-tools
+          
+          sudo rm -rfv /usr/bin/conda || true
+
+      - name: Test transformers-musicgen
+        run: |
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
+           make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
+
+
+
+  # tests-petals:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v6
-  #       with:
+  #       uses: actions/checkout@v4
+  #       with: 
  #         submodules: true
  #     - name: Dependencies
  #       run: |
  #         sudo apt-get update
  #         sudo apt-get install build-essential ffmpeg
-  #         # Install UV
-  #         curl -LsSf https://astral.sh/uv/install.sh | sh
+  #         curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+  #            sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+  #             gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+  #            sudo apt-get update && \
+  #            sudo apt-get install -y conda
  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
  #         sudo apt-get install -y libopencv-dev
-  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
+  #         pip install --user grpcio-tools
+          
+  #         sudo rm -rfv /usr/bin/conda || true

-  #     - name: Test transformers-musicgen
+  #     - name: Test petals
  #       run: |
-  #          make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen
-  #          make --jobs=5 --output-sync=target -C backend/python/transformers-musicgen test
+  #          export PATH=$PATH:/opt/conda/bin
+  #          make --jobs=5 --output-sync=target -C backend/python/petals
+  #          make --jobs=5 --output-sync=target -C backend/python/petals test
+
+           

  # tests-bark:
  #   runs-on: ubuntu-latest
@@ -223,308 +242,114 @@ jobs:
  #           sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
  #           df -h
  #     - name: Clone
-  #       uses: actions/checkout@v6
-  #       with:
+  #       uses: actions/checkout@v4
+  #       with: 
  #         submodules: true
  #     - name: Dependencies
  #       run: |
  #         sudo apt-get update
  #         sudo apt-get install build-essential ffmpeg
-  #         # Install UV
-  #         curl -LsSf https://astral.sh/uv/install.sh | sh
+  #         curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+  #            sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+  #             gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+  #            sudo apt-get update && \
+  #            sudo apt-get install -y conda
  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
  #         sudo apt-get install -y libopencv-dev
-  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
+  #         pip install --user grpcio-tools
+          
+  #         sudo rm -rfv /usr/bin/conda || true

  #     - name: Test bark
  #       run: |
+  #          export PATH=$PATH:/opt/conda/bin
  #          make --jobs=5 --output-sync=target -C backend/python/bark
  #          make --jobs=5 --output-sync=target -C backend/python/bark test

-
+           
  # Below tests needs GPU. Commented out for now
  # TODO: Re-enable as soon as we have GPU nodes
  # tests-vllm:
  #   runs-on: ubuntu-latest
  #   steps:
  #     - name: Clone
-  #       uses: actions/checkout@v6
-  #       with:
+  #       uses: actions/checkout@v4
+  #       with: 
  #         submodules: true
  #     - name: Dependencies
  #       run: |
  #         sudo apt-get update
  #         sudo apt-get install build-essential ffmpeg
-  #         # Install UV
-  #         curl -LsSf https://astral.sh/uv/install.sh | sh
+  #         curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+  #            sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+  #             gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+  #            sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+  #            sudo apt-get update && \
+  #            sudo apt-get install -y conda
  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
  #         sudo apt-get install -y libopencv-dev
-  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
+  #         pip install --user grpcio-tools
+  #         sudo rm -rfv /usr/bin/conda || true
  #     - name: Test vllm
  #       run: |
+  #          export PATH=$PATH:/opt/conda/bin
  #          make --jobs=5 --output-sync=target -C backend/python/vllm
  #          make --jobs=5 --output-sync=target -C backend/python/vllm test
-
-  tests-coqui:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.coqui == 'true' || needs.detect-changes.outputs.run-all == 'true'
+  tests-vallex:
    runs-on: ubuntu-latest
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg
-          sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test coqui
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/coqui
-          make --jobs=5 --output-sync=target -C backend/python/coqui test
-  tests-moonshine:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.moonshine == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg
-          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test moonshine
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/moonshine
-          make --jobs=5 --output-sync=target -C backend/python/moonshine test
-  tests-pocket-tts:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.pocket-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg
-          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test pocket-tts
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/pocket-tts
-          make --jobs=5 --output-sync=target -C backend/python/pocket-tts test
-  tests-qwen-tts:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.qwen-tts == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg
-          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test qwen-tts
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/qwen-tts
-          make --jobs=5 --output-sync=target -C backend/python/qwen-tts test
-  # TODO: s2-pro model is too large to load on CPU-only CI runners — re-enable
-  # when we have GPU runners or a smaller test model.
-  # tests-fish-speech:
-  #   runs-on: ubuntu-latest
-  #   timeout-minutes: 45
-  #   steps:
-  #     - name: Clone
-  #       uses: actions/checkout@v6
-  #       with:
-  #         submodules: true
-  #     - name: Dependencies
-  #       run: |
-  #         sudo apt-get update
-  #         sudo apt-get install -y build-essential ffmpeg portaudio19-dev
-  #         sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-  #         # Install UV
-  #         curl -LsSf https://astral.sh/uv/install.sh | sh
-  #         pip install --user --no-cache-dir grpcio-tools==1.64.1
-  #     - name: Test fish-speech
-  #       run: |
-  #         make --jobs=5 --output-sync=target -C backend/python/fish-speech
-  #         make --jobs=5 --output-sync=target -C backend/python/fish-speech test
-  tests-qwen-asr:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.qwen-asr == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg sox
-          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test qwen-asr
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/qwen-asr
-          make --jobs=5 --output-sync=target -C backend/python/qwen-asr test
-  tests-nemo:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.nemo == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential ffmpeg sox
-          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test nemo
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/nemo
-          make --jobs=5 --output-sync=target -C backend/python/nemo test
-  tests-voxcpm:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.voxcpm == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
          sudo apt-get install build-essential ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
          sudo apt-get install -y ca-certificates cmake curl patch python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Test voxcpm
+          sudo apt-get install -y libopencv-dev
+          pip install --user grpcio-tools
+          sudo rm -rfv /usr/bin/conda || true
+      - name: Test vall-e-x
        run: |
-          make --jobs=5 --output-sync=target -C backend/python/voxcpm
-          make --jobs=5 --output-sync=target -C backend/python/voxcpm test
-  tests-llama-cpp-quantization:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.llama-cpp-quantization == 'true' || needs.detect-changes.outputs.run-all == 'true'
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/vall-e-x
+           make --jobs=5 --output-sync=target -C backend/python/vall-e-x test
+
+  tests-coqui:
    runs-on: ubuntu-latest
-    timeout-minutes: 30
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Dependencies
        run: |
          sudo apt-get update
-          sudo apt-get install -y build-essential cmake curl git python3-pip
-          # Install UV
-          curl -LsSf https://astral.sh/uv/install.sh | sh
-          pip install --user --no-cache-dir grpcio-tools==1.64.1
-      - name: Build llama-quantize from llama.cpp
+          sudo apt-get install build-essential ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+              gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
+          sudo apt-get install -y ca-certificates cmake curl patch espeak espeak-ng python3-pip
+          pip install --user grpcio-tools
+          sudo rm -rfv /usr/bin/conda || true
+
+      - name: Test coqui
        run: |
-          git clone --depth 1 https://github.com/ggml-org/llama.cpp.git /tmp/llama.cpp
-          cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_NATIVE=OFF
-          cmake --build /tmp/llama.cpp/build --target llama-quantize -j$(nproc)
-          sudo cp /tmp/llama.cpp/build/bin/llama-quantize /usr/local/bin/
-      - name: Install backend
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization
-      - name: Test llama-cpp-quantization
-        run: |
-          make --jobs=5 --output-sync=target -C backend/python/llama-cpp-quantization test
-  tests-acestep-cpp:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.acestep-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
-      - name: Setup Go
-        uses: actions/setup-go@v5
-      - name: Display Go version
-        run: go version
-      - name: Proto Dependencies
-        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - name: Build acestep-cpp
-        run: |
-          make --jobs=5 --output-sync=target -C backend/go/acestep-cpp
-      - name: Test acestep-cpp
-        run: |
-          make --jobs=5 --output-sync=target -C backend/go/acestep-cpp test
-  tests-voxtral:
-    needs: detect-changes
-    if: needs.detect-changes.outputs.voxtral == 'true' || needs.detect-changes.outputs.run-all == 'true'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential cmake curl libopenblas-dev ffmpeg
-      - name: Setup Go
-        uses: actions/setup-go@v5
-      # You can test your matrix by printing the current Go version
-      - name: Display Go version
-        run: go version
-      - name: Proto Dependencies
-        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - name: Build voxtral
-        run: |
-          make --jobs=5 --output-sync=target -C backend/go/voxtral
-      - name: Test voxtral
-        run: |
-          make --jobs=5 --output-sync=target -C backend/go/voxtral test
+           export PATH=$PATH:/opt/conda/bin
+           make --jobs=5 --output-sync=target -C backend/python/coqui
+           make --jobs=5 --output-sync=target -C backend/python/coqui test
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -10,7 +10,7 @@ on:
      - '*'

 env:
-  GRPC_VERSION: v1.65.0
+  GRPC_VERSION: v1.58.0

 concurrency:
  group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
@@ -21,22 +21,8 @@ jobs:
    runs-on: ubuntu-latest
    strategy:
      matrix:
-        go-version: ['1.26.x']
+        go-version: ['1.21.x']
    steps:
-      - name: Free Disk Space (Ubuntu)
-        uses: jlumbroso/free-disk-space@main
-        with:
-          # this might remove tools that are actually needed,
-          # if set to "true" but frees about 6 GB
-          tool-cache: true
-          # all of these default to true, but feel free to set to
-          # "false" if necessary for your workflow
-          android: true
-          dotnet: true
-          haskell: true
-          large-packages: true
-          docker-images: true
-          swap-storage: true
      - name: Release space from worker
        run: |
          echo "Listing top largest packages"
@@ -70,8 +56,8 @@ jobs:
          sudo rm -rfv build || true
          df -h
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Setup Go ${{ matrix.go-version }}
        uses: actions/setup-go@v5
@@ -81,42 +67,65 @@ jobs:
      # You can test your matrix by printing the current Go version
      - name: Display Go version
        run: go version
-      - name: Proto Dependencies
-        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
      - name: Dependencies
        run: |
          sudo apt-get update
-          sudo apt-get install curl ffmpeg libopus-dev
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
+          sudo apt-get install build-essential curl ffmpeg
+          curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+             sudo install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+             gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list' && \
+             sudo /bin/bash -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list' && \
+             sudo apt-get update && \
+             sudo apt-get install -y conda
+          sudo apt-get install -y ca-certificates cmake patch python3-pip unzip
+          sudo apt-get install -y libopencv-dev
+
+          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+          rm protoc.zip
+
+          go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
+          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+
+          # The python3-grpc-tools package in 22.04 is too old
+          pip install --user grpcio-tools
+
+          sudo rm -rfv /usr/bin/conda || true
+          PATH=$PATH:/opt/conda/bin make -C backend/python/sentencetransformers
+
+          # Pre-build piper before we start tests in order to have shared libraries in place
+          make sources/go-piper && \
+          GO_TAGS="tts" make -C sources/go-piper piper.o && \
+          sudo cp -rfv sources/go-piper/piper-phonemize/pi/lib/. /usr/lib/ && \
+          # Pre-build stable diffusion before we install a newer version of abseil (not compatible with stablediffusion-ncn)
+          PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
+      - name: Cache grpc
+        id: cache-grpc
+        uses: actions/cache@v4
        with:
-          node-version: '22'
-      - name: Build React UI
-        run: make react-ui
-      - name: Build backends
+          path: grpc
+          key: ${{ runner.os }}-grpc-${{ env.GRPC_VERSION }}
+      - name: Build grpc
+        if: steps.cache-grpc.outputs.cache-hit != 'true'
        run: |
-          make backends/transformers
-          mkdir external && mv backends/transformers external/transformers
-          make backends/llama-cpp backends/local-store backends/silero-vad backends/piper backends/whisper backends/stablediffusion-ggml
+          git clone --recurse-submodules -b ${{ env.GRPC_VERSION }} --depth 1 --jobs 5 --shallow-submodules https://github.com/grpc/grpc && \
+          cd grpc && mkdir -p cmake/build && cd cmake/build && cmake -DgRPC_INSTALL=ON \
+            -DgRPC_BUILD_TESTS=OFF \
+            ../.. && sudo make --jobs 5
+      - name: Install gRPC
+        run: |
+          cd grpc && cd cmake/build && sudo make --jobs 5 install
      - name: Test
        run: |
-          TRANSFORMER_BACKEND=$PWD/external/transformers/run.sh PATH="$PATH:/root/go/bin" GO_TAGS="tts" make --jobs 5 --output-sync=target test
+          PATH="$PATH:/root/go/bin" GO_TAGS="stablediffusion tts" make --jobs 5 --output-sync=target test
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.18
        with:
-          detached: true
          connect-timeout-seconds: 180
-          limit-access-to-actor: true

-  tests-e2e-container:
+  tests-aio-container:
    runs-on: ubuntu-latest
    steps:
      - name: Release space from worker
@@ -152,38 +161,32 @@ jobs:
          sudo rm -rfv build || true
          df -h
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
-      - name: Dependencies
+      - name: Build images
        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
+          docker build --build-arg FFMPEG=true --build-arg IMAGE_TYPE=core --build-arg MAKEFLAGS="--jobs=5 --output-sync=target" -t local-ai:tests -f Dockerfile .
+          BASE_IMAGE=local-ai:tests DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio
      - name: Test
        run: |
-            PATH="$PATH:$HOME/go/bin" make backends/local-store backends/silero-vad backends/llama-cpp backends/whisper backends/piper backends/stablediffusion-ggml docker-build-e2e e2e-aio
+          LOCALAI_MODELS_DIR=$PWD/models LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio \
+            make run-e2e-aio
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.18
        with:
-          detached: true
          connect-timeout-seconds: 180
-          limit-access-to-actor: true

  tests-apple:
-    runs-on: macos-latest
+    runs-on: macOS-14
    strategy:
      matrix:
-        go-version: ['1.26.x']
+        go-version: ['1.21.x']
    steps:
      - name: Clone
-        uses: actions/checkout@v6
-        with:
+        uses: actions/checkout@v4
+        with: 
          submodules: true
      - name: Setup Go ${{ matrix.go-version }}
        uses: actions/setup-go@v5
@@ -195,31 +198,17 @@ jobs:
        run: go version
      - name: Dependencies
        run: |
-          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus
-          pip install --user --no-cache-dir grpcio-tools grpcio
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: '22'
-      - name: Build React UI
-        run: make react-ui
-      - name: Build llama-cpp-darwin
-        run: |
-          make protogen-go
-          make backends/llama-cpp-darwin
+          brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc
+          pip install --user grpcio-tools
      - name: Test
        run: |
          export C_INCLUDE_PATH=/usr/local/include
          export CPLUS_INCLUDE_PATH=/usr/local/include
-          export CC=/opt/homebrew/opt/llvm/bin/clang
          # Used to run the newer GNUMake version from brew that supports --output-sync
          export PATH="/opt/homebrew/opt/make/libexec/gnubin:$PATH"
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-          PATH="$PATH:$HOME/go/bin" BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" make --jobs 4 --output-sync=target test
+          BUILD_TYPE="GITHUB_CI_HAS_BROKEN_METAL" CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" make --jobs 4 --output-sync=target test
      - name: Setup tmate session if tests fail
        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
+        uses: mxschmitt/action-tmate@v3.18
        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
+          connect-timeout-seconds: 180
--- a/.github/workflows/tests-e2e.yml
+++ b/.github/workflows/tests-e2e.yml
@@ -1,62 +0,0 @@
---
-name: 'E2E Backend Tests'
-
-on:
-  pull_request:
-  push:
-    branches:
-      - master
-    tags:
-      - '*'
-
-concurrency:
-  group: ci-tests-e2e-backend-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  tests-e2e-backend:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        go-version: ['1.25.x']
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Setup Go ${{ matrix.go-version }}
-        uses: actions/setup-go@v5
-        with:
-          go-version: ${{ matrix.go-version }}
-          cache: false
-      - name: Display Go version
-        run: go version
-      - name: Proto Dependencies
-        run: |
-          # Install protoc
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          PATH="$PATH:$HOME/go/bin" make protogen-go
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential libopus-dev
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: '22'
-      - name: Build React UI
-        run: make react-ui
-      - name: Test Backend E2E
-        run: |
-          PATH="$PATH:$HOME/go/bin" make build-mock-backend test-e2e
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
--- a/.github/workflows/tests-ui-e2e.yml
+++ b/.github/workflows/tests-ui-e2e.yml
@@ -1,72 +0,0 @@
---
-name: 'UI E2E Tests'
-
-on:
-  pull_request:
-    paths:
-      - 'core/http/**'
-      - 'tests/e2e-ui/**'
-      - 'tests/e2e/mock-backend/**'
-  push:
-    branches:
-      - master
-
-concurrency:
-  group: ci-tests-ui-e2e-${{ github.head_ref || github.ref }}-${{ github.repository }}
-  cancel-in-progress: true
-
-jobs:
-  tests-ui-e2e:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        go-version: ['1.26.x']
-    steps:
-      - name: Clone
-        uses: actions/checkout@v6
-        with:
-          submodules: true
-      - name: Setup Go ${{ matrix.go-version }}
-        uses: actions/setup-go@v5
-        with:
-          go-version: ${{ matrix.go-version }}
-          cache: false
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: '22'
-      - name: Proto Dependencies
-        run: |
-          curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
-          unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-          rm protoc.zip
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-      - name: System Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y build-essential libopus-dev
-      - name: Build UI test server
-        run: PATH="$PATH:$HOME/go/bin" make build-ui-test-server
-      - name: Install Playwright
-        working-directory: core/http/react-ui
-        run: |
-          npm install
-          npx playwright install --with-deps chromium
-      - name: Run Playwright tests
-        working-directory: core/http/react-ui
-        run: npx playwright test
-      - name: Upload Playwright report
-        if: ${{ failure() }}
-        uses: actions/upload-artifact@v7
-        with:
-          name: playwright-report
-          path: core/http/react-ui/playwright-report/
-          retention-days: 7
-      - name: Setup tmate session if tests fail
-        if: ${{ failure() }}
-        uses: mxschmitt/action-tmate@v3.23
-        with:
-          detached: true
-          connect-timeout-seconds: 180
-          limit-access-to-actor: true
--- a/.github/workflows/update_swagger.yaml
+++ b/.github/workflows/update_swagger.yaml
@@ -1,38 +0,0 @@
-name: Update swagger
-on:
-  schedule:
-    - cron: 0 20 * * *
-  workflow_dispatch:
-jobs:
-  swagger:
-    if: github.repository == 'mudler/LocalAI'
-    strategy:
-      fail-fast: false
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v6
-      - uses: actions/setup-go@v5
-        with:
-          go-version: 'stable'
-      - name: Dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install protobuf-compiler
-      - run: |
-          go install github.com/swaggo/swag/cmd/swag@latest
-          go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
-          go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2
-      - name: Bump swagger 🔧
-        run: |
-          make protogen-go swagger
-      - name: Create Pull Request
-        uses: peter-evans/create-pull-request@v8
-        with:
-          token: ${{ secrets.UPDATE_BOT_TOKEN }}
-          push-to-fork: ci-forks/LocalAI
-          commit-message: 'feat(swagger): update swagger'
-          title: 'feat(swagger): update swagger'
-          branch: "update/swagger"
-          body:  Update swagger
-          signoff: true
-
--- a/.github/workflows/yaml-check.yml
+++ b/.github/workflows/yaml-check.yml
@@ -1,26 +0,0 @@
-name: 'Yamllint GitHub Actions'
-on:
-  - pull_request
-jobs:
-  yamllint:
-    name: 'Yamllint'
-    runs-on: ubuntu-latest
-    steps:
-      - name: 'Checkout'
-        uses: actions/checkout@master
-      - name: 'Yamllint model gallery'
-        uses: karancode/yamllint-github-action@master
-        with:
-          yamllint_file_or_dir: 'gallery'
-          yamllint_strict: false
-          yamllint_comment: true
-        env:
-          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-      - name: 'Yamllint Backend gallery'
-        uses: karancode/yamllint-github-action@master
-        with:
-          yamllint_file_or_dir: 'backend'
-          yamllint_strict: false
-          yamllint_comment: true
-        env:
-          GITHUB_ACCESS_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -2,30 +2,21 @@
 /sources/
 __pycache__/
 *.a
-*.o
 get-sources
 prepare-sources
-/backend/cpp/llama-cpp/grpc-server
-/backend/cpp/llama-cpp/llama.cpp
-/backend/cpp/llama-*
-!backend/cpp/llama-cpp
-/backends
-/backend-images
-/result.yaml
-protoc
-
-*.log
+/backend/cpp/llama/grpc-server
+/backend/cpp/llama/llama.cpp

 go-ggml-transformers
 go-gpt2
+go-rwkv
 whisper.cpp
 /bloomz
 go-bert

 # LocalAI build binary
 LocalAI
-/local-ai
-/local-ai-launcher
+local-ai
 # prevent above rules from omitting the helm chart
 !charts/*
 # prevent above rules from omitting the api/localai folder
@@ -36,8 +27,6 @@ LocalAI
 models/*
 test-models/
 test-dir/
-tests/e2e-aio/backends
-mock-backend

 release/

@@ -50,30 +39,8 @@ backend-assets/*
 !backend-assets/.keep
 prepare
 /ggml-metal.metal
-docs/static/gallery.html

 # Protobuf generated files
 *.pb.go
 *pb2.py
 *pb2_grpc.py
-
-# SonarQube
-.scannerwork
-
-# backend virtual environments
-**/venv
-
-# per-developer customization files for the development container
-.devcontainer/customization/*
-
-# React UI build artifacts (keep placeholder dist/index.html)
-core/http/react-ui/node_modules/
-core/http/react-ui/dist
-
-# Extracted backend binaries for container-based testing
-local-backends/
-
-# UI E2E test artifacts
-tests/e2e-ui/ui-test-server
-core/http/react-ui/playwright-report/
-core/http/react-ui/test-results/
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +1,6 @@
 [submodule "docs/themes/hugo-theme-relearn"]
 	path = docs/themes/hugo-theme-relearn
 	url = https://github.com/McShelby/hugo-theme-relearn.git
+[submodule "docs/themes/lotusdocs"]
+	path = docs/themes/lotusdocs
+	url = https://github.com/colinwilson/lotusdocs
--- a/.goreleaser.yaml
+++ b/.goreleaser.yaml
@@ -1,37 +0,0 @@
-version: 2
-before:
-  hooks:
-    - make protogen-go
-    - make react-ui
-    - go mod tidy
-dist: release
-source:
-  enabled: true
-  name_template: '{{ .ProjectName }}-{{ .Tag }}-source'
-builds:
-  - main: ./cmd/local-ai
-    env:
-      - CGO_ENABLED=0
-    ldflags:
-      - -s -w
-      - -X "github.com/mudler/LocalAI/internal.Version={{ .Tag }}"
-      - -X "github.com/mudler/LocalAI/internal.Commit={{ .FullCommit }}"
-    goos:
-      - linux
-      - darwin
-      #- windows
-    goarch:
-      - amd64
-      - arm64
-    ignore:
-      - goos: darwin
-        goarch: amd64
-archives:
-  - formats: [ 'binary' ] # this removes the tar of the archives, leaving the binaries alone
-    name_template: local-ai-{{ .Tag }}-{{ .Os }}-{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}
-checksum:
-  name_template: '{{ .ProjectName }}-{{ .Tag }}-checksums.txt'
-snapshot:
-  version_template: "{{ .Tag }}-next"
-changelog:
-  use: github-native
--- a/.vscode/launch.json
+++ b/.vscode/launch.json
@@ -3,12 +3,12 @@
    "configurations": [
        {
            "name": "Python: Current File",
-            "type": "debugpy",
+            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "justMyCode": false,
-            "cwd": "${fileDirname}",
+            "cwd": "${workspaceFolder}/examples/langchain-chroma",
            "env": {
                "OPENAI_API_BASE": "http://localhost:8080/v1",
                "OPENAI_API_KEY": "abc"
@@ -19,16 +19,15 @@
            "type": "go",
            "request": "launch",
            "mode": "debug",
-            "program": "${workspaceRoot}",
-            "args": [],
+            "program": "${workspaceFolder}/main.go",
+            "args": [
+                "api"
+            ],
            "env": {
-                "LOCALAI_LOG_LEVEL": "debug",
-                "LOCALAI_P2P": "true",
-                "LOCALAI_FEDERATED": "true"
-            },
-            "buildFlags": ["-tags", "", "-v"],
-            "envFile": "${workspaceFolder}/.env",
-            "cwd": "${workspaceRoot}"
+                "C_INCLUDE_PATH": "${workspaceFolder}/go-llama:${workspaceFolder}/go-stable-diffusion/:${workspaceFolder}/gpt4all/gpt4all-bindings/golang/:${workspaceFolder}/go-gpt2:${workspaceFolder}/go-rwkv:${workspaceFolder}/whisper.cpp:${workspaceFolder}/go-bert:${workspaceFolder}/bloomz",
+                "LIBRARY_PATH": "${workspaceFolder}/go-llama:${workspaceFolder}/go-stable-diffusion/:${workspaceFolder}/gpt4all/gpt4all-bindings/golang/:${workspaceFolder}/go-gpt2:${workspaceFolder}/go-rwkv:${workspaceFolder}/whisper.cpp:${workspaceFolder}/go-bert:${workspaceFolder}/bloomz",
+                "DEBUG": "true"
+            }
        }
    ]
 }
--- a/.yamllint
+++ b/.yamllint
@@ -1,4 +0,0 @@
-extends: default
-
-rules:
-    line-length: disable
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,24 +0,0 @@
-# LocalAI Agent Instructions
-
-This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
-
-## Topics
-
-| File | When to read |
-|------|-------------|
-| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
-| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
-| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
-| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
-| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
-| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
-| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
-
-## Quick Reference
-
- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
- **Go style**: Prefer `any` over `interface{}`
- **Comments**: Explain *why*, not *what*
- **Docs**: Update `docs/content/` when adding features or changing config
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1 +0,0 @@
-AGENTS.md
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -7,136 +7,31 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
 - [Getting Started](#getting-started)
  - [Prerequisites](#prerequisites)
  - [Setting up the Development Environment](#setting-up-the-development-environment)
-  - [Environment Variables](#environment-variables)
 - [Contributing](#contributing)
  - [Submitting an Issue](#submitting-an-issue)
-  - [Development Workflow](#development-workflow)
  - [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
 - [Coding Guidelines](#coding-guidelines)
 - [Testing](#testing)
 - [Documentation](#documentation)
 - [Community and Communication](#community-and-communication)

+
+
 ## Getting Started

 ### Prerequisites

- **Go 1.21+** (the project currently uses Go 1.26 in `go.mod`, but 1.21 is the minimum supported version)
-  - [Download Go](https://go.dev/dl/) or install via your package manager
-  - macOS: `brew install go`
-  - Ubuntu/Debian: follow the [official instructions](https://go.dev/doc/install) (the `apt` version is often outdated)
-  - Verify: `go version`
- **Git**
- **GNU Make**
- **GCC / C/C++ toolchain** (required for CGo and native backends)
- **Protocol Buffers compiler** (`protoc`) — needed for gRPC code generation
+- Golang [1.21]
+- Git
+- macOS/Linux

-#### System dependencies by platform
+### Setting up the Development Environment and running localAI in the local environment

-<details>
-<summary><strong>Ubuntu / Debian</strong></summary>
-
-```bash
-sudo apt-get update
-sudo apt-get install -y build-essential gcc g++ cmake git wget \
-  protobuf-compiler libprotobuf-dev pkg-config \
-  libopencv-dev libgrpc-dev
-```
-
-</details>
-
-<details>
-<summary><strong>CentOS / RHEL / Fedora</strong></summary>
-
-```bash
-sudo dnf groupinstall -y "Development Tools"
-sudo dnf install -y cmake git wget protobuf-compiler protobuf-devel \
-  opencv-devel grpc-devel
-```
-
-</details>
-
-<details>
-<summary><strong>macOS</strong></summary>
-
-```bash
-xcode-select --install
-brew install cmake git protobuf grpc opencv wget
-```
-
-</details>
-
-<details>
-<summary><strong>Windows</strong></summary>
-
-Use [WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install) with an Ubuntu distribution, then follow the Ubuntu instructions above.
-
-</details>
-
-### Setting up the Development Environment
-
-1. **Clone the repository:**
-
-   ```bash
-   git clone https://github.com/mudler/LocalAI.git
-   cd LocalAI
-   ```
-
-2. **Build LocalAI:**
-
-   ```bash
-   make build
-   ```
-
-   This runs protobuf generation, installs Go tools, builds the React UI, and compiles the `local-ai` binary. Key build variables you can set:
-
-   | Variable | Description | Example |
-   |---|---|---|
-   | `BUILD_TYPE` | GPU/accelerator type (`cublas`, `hipblas`, `intel`, ``) | `BUILD_TYPE=cublas make build` |
-   | `GO_TAGS` | Additional Go build tags | `GO_TAGS=debug make build` |
-   | `CUDA_MAJOR_VERSION` | CUDA major version (default: `13`) | `CUDA_MAJOR_VERSION=12` |
-
-3. **Run LocalAI:**
-
-   ```bash
-   ./local-ai
-   ```
-
-4. **Development mode with live reload:**
-
-   ```bash
-   make build-dev
-   ```
-
-   This installs [`air`](https://github.com/air-verse/air) automatically and watches for file changes, rebuilding and restarting the server on each save.
-
-5. **Containerized build** (no local toolchain needed):
-
-   ```bash
-   make docker
-   ```
-
-   For GPU-specific Docker builds, see the `docker-build-*` targets in the Makefile and refer to [CLAUDE.md](CLAUDE.md) for detailed backend build instructions.
-
-### Environment Variables
-
-LocalAI is configured primarily through environment variables (or equivalent CLI flags). The most useful ones for development are:
-
-| Variable | Description | Default |
-|---|---|---|
-| `LOCALAI_DEBUG` | Enable debug mode | `false` |
-| `LOCALAI_LOG_LEVEL` | Log verbosity (`error`, `warn`, `info`, `debug`, `trace`) | — |
-| `LOCALAI_LOG_FORMAT` | Log format (`default`, `text`, `json`) | `default` |
-| `LOCALAI_MODELS_PATH` | Path to model files | `./models` |
-| `LOCALAI_BACKENDS_PATH` | Path to backend binaries | `./backends` |
-| `LOCALAI_CONFIG_DIR` | Directory for dynamic config files (API keys, external backends) | `./configuration` |
-| `LOCALAI_THREADS` | Number of threads for inference | — |
-| `LOCALAI_ADDRESS` | Bind address for the API server | `:8080` |
-| `LOCALAI_API_KEY` | API key(s) for authentication | — |
-| `LOCALAI_CORS` | Enable CORS | `false` |
-| `LOCALAI_DISABLE_WEBUI` | Disable the web UI | `false` |
-
-See `core/cli/run.go` for the full list of supported environment variables.
+1. Clone the repository: `git clone https://github.com/go-skynet/LocalAI.git`
+2. Navigate to the project directory: `cd LocalAI`
+3. Install the required dependencies ( see https://localai.io/basics/build/#build-localai-locally )
+4. Build LocalAI: `make build`
+5. Run LocalAI: `./local-ai`

 ## Contributing

@@ -146,145 +41,48 @@ We welcome contributions from everyone! To get started, follow these steps:

 If you find a bug, have a feature request, or encounter any issues, please check the [issue tracker](https://github.com/go-skynet/LocalAI/issues) to see if a similar issue has already been reported. If not, feel free to [create a new issue](https://github.com/go-skynet/LocalAI/issues/new) and provide as much detail as possible.

-### Development Workflow
-
-#### Branch naming conventions
-
-Use a descriptive branch name that indicates the type and scope of the change:
-
- `feature/<short-description>` — new functionality
- `fix/<short-description>` — bug fixes
- `docs/<short-description>` — documentation changes
- `refactor/<short-description>` — code refactoring
-
-#### Commit messages
-
- Use a short, imperative subject line (e.g., "feat: add whisper backend support", not "Added whisper backend support")
- Keep the subject under 72 characters
- Use the body to explain **why** the change was made when the subject alone is not sufficient
- Use [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/)
-
-#### Creating a Pull Request (PR)
-
-Before jumping into a PR for a massive feature or big change, it is preferred to discuss it first via an issue.
+### Creating a Pull Request (PR)

 1. Fork the repository.
-2. Create a new branch: `git checkout -b feature/my-change`
-3. Make your changes, keeping commits focused and atomic.
-4. Run tests locally before pushing (see [Testing](#testing) below).
-5. Push to your fork: `git push origin feature/my-change`
-6. Open a pull request against the `master` branch.
-7. Fill in the PR description with:
-   - What the change does and why
-   - How it was tested
-   - Any breaking changes or migration steps
-8. Respond to review feedback promptly. Push follow-up commits rather than force-pushing amended commits so reviewers can see incremental changes.
-9. Once approved, a maintainer will merge your PR.
+2. Create a new branch with a descriptive name: `git checkout -b [branch name]`
+3. Make your changes and commit them.
+4. Push the changes to your fork: `git push origin [branch name]`
+5. Create a new pull request from your branch to the main project's `main` or `master` branch.
+6. Provide a clear description of your changes in the pull request.
+7. Make any requested changes during the review process.
+8. Once your PR is approved, it will be merged into the main project.

 ## Coding Guidelines

-This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
-
-For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
-
-### General Principles
-
- Write code that can be tested. All new features and bug fixes should include test coverage.
- Use comments sparingly to explain **why** code does something, not **what** it does. Comments should add context that would be difficult to deduce from reading the code alone.
- Keep changes focused. Avoid unrelated refactors, formatting changes, or feature additions in the same PR.
-
-### Go Code
-
- Prefer modern Go idioms — for example, use `any` instead of `interface{}`.
- Use [`golangci-lint`](https://golangci-lint.run) to catch common issues before submitting a PR.
- Use [`github.com/mudler/xlog`](https://github.com/mudler/xlog) for logging (same API as `slog`). Do not use `fmt.Println` or the standard `log` package for operational logging.
- Use tab indentation for Go files (as defined in `.editorconfig`).
-
-### Python Code
-
- Use 4-space indentation (as defined in `.editorconfig`).
- Include a `requirements.txt` for any new dependencies.
-
-### Code Review
-
- All contributions go through code review via pull requests.
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
- Be responsive to review feedback and keep discussions constructive.
+- No specific coding guidelines at the moment. Please make sure the code can be tested. The most popular lint tools like []`golangci-lint`](https://golangci-lint.run) can help you here.

 ## Testing

-All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
+`make test` cannot handle all the model now. Please be sure to add a test case for the new features or the part was changed.

-### Running unit tests
+### Running AIO tests

-```bash
-make test
-```
-
-This downloads test model fixtures, runs protobuf generation, and executes the full test suite including llama-gguf, TTS, and stable-diffusion tests. Note: some tests require model files to be downloaded, so the first run may take longer.
-
-To run tests for a specific package:
-
-```bash
-go test ./core/config/...
-go test ./pkg/model/...
-```
-
-To run a specific test by name using Ginkgo's `--focus` flag:
-
-```bash
-go run github.com/onsi/ginkgo/v2/ginkgo --focus="should load a model" -v -r ./core/
-```
-
-### Running end-to-end tests
-
-The e2e tests run LocalAI in a Docker container and exercise the API:
-
-```bash
-make test-e2e
-```
-
-### Running E2E container tests
-
-These tests build a standard LocalAI Docker image and run it with pre-configured model configs to verify that most endpoints work correctly:
+All-In-One images has a set of tests that automatically verifies that most of the endpoints works correctly, a flow can be :

 ```bash
 # Build the LocalAI docker image
-make docker-build-e2e
+make DOCKER_IMAGE=local-ai docker

-# Run the e2e tests (uses model configs from tests/e2e-aio/models/)
-make e2e-aio
-```
+# Build the corresponding AIO image
+BASE_IMAGE=local-ai DOCKER_AIO_IMAGE=local-ai-aio:test make docker-aio

-### Testing backends
-
-To prepare and test extra (Python) backends:
-
-```bash
-make prepare-test-extra   # build Python backends for testing
-make test-extra           # run backend-specific tests
+# Run the AIO e2e tests
+LOCALAI_IMAGE_TAG=test LOCALAI_IMAGE=local-ai-aio make run-e2e-aio
 ```

 ## Documentation

-We welcome contributions to the documentation. Please open a new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
-
-### Gallery YAML Schema
-
-LocalAI provides a JSON Schema for gallery model YAML files at:
-
-`core/schema/gallery-model.schema.json`
-
-This schema mirrors the internal gallery model configuration and can be used by editors (such as VS Code) to enable autocomplete, validation, and inline documentation when creating or modifying gallery files.
-
-To use it with the YAML language server, add the following comment at the top of a gallery YAML file:
-
-```yaml
-# yaml-language-server: $schema=../core/schema/gallery-model.schema.json
-```
-
+We are welcome the contribution of the documents, please open new PR or create a new issue. The documentation is available under `docs/` https://github.com/mudler/LocalAI/tree/master/docs
+ 
 ## Community and Communication

 - You can reach out via the Github issue tracker.
 - Open a new discussion at [Discussion](https://github.com/go-skynet/LocalAI/discussions)
 - Join the Discord channel [Discord](https://discord.gg/uJAeKSAGDy)
+
+---
--- a/536
+++ b/536
@@ -1,394 +1,294 @@
-ARG BASE_IMAGE=ubuntu:24.04
-ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
-ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
-ARG UBUNTU_CODENAME=noble
+ARG IMAGE_TYPE=extras
+ARG BASE_IMAGE=ubuntu:22.04

-FROM ${BASE_IMAGE} AS requirements
+# extras or core
+FROM ${BASE_IMAGE} as requirements-core

-ENV DEBIAN_FRONTEND=noninteractive
-
-RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        ca-certificates curl wget espeak-ng libgomp1 \
-        ffmpeg libopenblas0 libopenblas-dev libopus0 sox && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# The requirements-drivers target is for BUILD_TYPE specific items.  If you need to install something specific to CUDA, or specific to ROCM, it goes here.
-FROM requirements AS requirements-drivers
+USER root

+ARG GO_VERSION=1.21.7
 ARG BUILD_TYPE
-ARG CUDA_MAJOR_VERSION=12
-ARG CUDA_MINOR_VERSION=0
-ARG SKIP_DRIVERS=false
+ARG CUDA_MAJOR_VERSION=11
+ARG CUDA_MINOR_VERSION=7
 ARG TARGETARCH
 ARG TARGETVARIANT
+
 ENV BUILD_TYPE=${BUILD_TYPE}
-ARG UBUNTU_VERSION=2404
+ENV DEBIAN_FRONTEND=noninteractive
+ENV EXTERNAL_GRPC_BACKENDS="coqui:/build/backend/python/coqui/run.sh,huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh,petals:/build/backend/python/petals/run.sh,transformers:/build/backend/python/transformers/run.sh,sentencetransformers:/build/backend/python/sentencetransformers/run.sh,autogptq:/build/backend/python/autogptq/run.sh,bark:/build/backend/python/bark/run.sh,diffusers:/build/backend/python/diffusers/run.sh,exllama:/build/backend/python/exllama/run.sh,vall-e-x:/build/backend/python/vall-e-x/run.sh,vllm:/build/backend/python/vllm/run.sh,mamba:/build/backend/python/mamba/run.sh,exllama2:/build/backend/python/exllama2/run.sh,transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh,parler-tts:/build/backend/python/parler-tts/run.sh"

-RUN mkdir -p /run/localai
-RUN echo "default" > /run/localai/capability
-
-# Vulkan requirements
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils wget gpg-agent && \
-        apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
-            libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
-            libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
-            git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
-            ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
-            clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils mesa-vulkan-drivers
-        if [ "amd64" = "$TARGETARCH" ]; then
-            wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
-            tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
-            mkdir -p /opt/vulkan-sdk && \
-            mv 1.4.335.0 /opt/vulkan-sdk/ && \
-            cd /opt/vulkan-sdk/1.4.335.0 && \
-            ./vulkansdk --no-deps --maxjobs \
-                vulkan-loader \
-                vulkan-validationlayers \
-                vulkan-extensionlayer \
-                vulkan-tools \
-                shaderc && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
-            cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
-            rm -rf /opt/vulkan-sdk
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            mkdir vulkan && cd vulkan && \
-            curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
-            tar -xvf vulkan-sdk.tar.xz && \
-            rm vulkan-sdk.tar.xz && \
-            cd 1.4.335.0 && \
-            cp -rfv aarch64/bin/* /usr/bin/ && \
-            cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
-            cp -rfv aarch64/include/* /usr/include/ && \
-            cp -rfv aarch64/share/* /usr/share/ && \
-            cd ../.. && \
-            rm -rf vulkan
-        fi
-        ldconfig && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        echo "vulkan" > /run/localai/capability
-    fi
-EOT
-
-# CuBLAS requirements
-RUN <<EOT bash
-    if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
-        apt-get update && \
-        apt-get install -y  --no-install-recommends \
-            software-properties-common pciutils
-        if [ "amd64" = "$TARGETARCH" ]; then
-            curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
-        fi
-        if [ "arm64" = "$TARGETARCH" ]; then
-            if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
-            else
-                curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
-            fi
-        fi
-        dpkg -i cuda-keyring_1.1-1_all.deb && \
-        rm -f cuda-keyring_1.1-1_all.deb && \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
-            libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
-            apt-get install -y --no-install-recommends \
-            libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
-        fi
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        echo "nvidia-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
-    fi
-EOT
-
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        echo "nvidia-l4t-cuda-${CUDA_MAJOR_VERSION}" > /run/localai/capability
-    fi
-EOT
-
-# https://github.com/NVIDIA/Isaac-GR00T/issues/343
-RUN <<EOT bash
-    if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
-        wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
-        cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
-        wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
-        cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
-        apt-get update && apt-get install -y nvpl
-    fi
-EOT
-
-# If we are building with clblas support, we need the libraries for the builds
-RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            libclblast-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
-        apt-get update && \
-        apt-get install -y --no-install-recommends \
-            hipblas-dev \
-            rocblas-dev && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/* && \
-        echo "amd" > /run/localai/capability && \
-        # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
-        # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
-        ldconfig \
-    ; fi
-
-RUN if [ "${BUILD_TYPE}" = "hipblas" ]; then \
-    ln -s /opt/rocm-**/lib/llvm/lib/libomp.so /usr/lib/libomp.so \
-    ; fi
-
-RUN expr "${BUILD_TYPE}" = intel && echo "intel" > /run/localai/capability || echo "not intel"
-
-# Cuda
-ENV PATH=/usr/local/cuda/bin:${PATH}
-
-# HipBLAS requirements
-ENV PATH=/opt/rocm/bin:${PATH}
-
-###################################
-###################################
-
-# The requirements-core target is common to all images.  It should not be placed in requirements-core unless every single build will use it.
-FROM requirements-drivers AS build-requirements
-
-ARG GO_VERSION=1.26.0
-ARG CMAKE_VERSION=3.31.10
-ARG CMAKE_FROM_SOURCE=false
-ARG TARGETARCH
-ARG TARGETVARIANT
+ARG GO_TAGS="stablediffusion tinydream tts"

 RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        build-essential \
-        ccache \
-        ca-certificates espeak-ng \
-        curl libssl-dev \
-        git \
-        git-lfs \
-        libopus-dev pkg-config \
-        unzip upx-ucl python3 python-is-python3 && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# Install CMake (the version in 22.04 is too old)
-RUN <<EOT bash
-    if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
-        curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
-    else
-        apt-get update && \
-        apt-get install -y \
-            cmake && \
-        apt-get clean && \
-        rm -rf /var/lib/apt/lists/*
-    fi
-EOT
+    apt-get install -y ca-certificates curl python3-pip unzip && apt-get clean

 # Install Go
-RUN curl -L -s https://go.dev/dl/go${GO_VERSION}.linux-${TARGETARCH}.tar.gz | tar -C /usr/local -xz
-ENV PATH=$PATH:/root/go/bin:/usr/local/go/bin
+RUN curl -L -s https://go.dev/dl/go$GO_VERSION.linux-$TARGETARCH.tar.gz | tar -C /usr/local -xz
+ENV PATH $PATH:/usr/local/go/bin

 # Install grpc compilers
-RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 && \
-    go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af
+ENV PATH $PATH:/root/go/bin
+RUN go install google.golang.org/protobuf/cmd/protoc-gen-go@latest && \
+    go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest
+
+# Install protobuf (the version in 22.04 is too old)
+RUN curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \
+    unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
+    rm protoc.zip
+
+# Install grpcio-tools (the version in 22.04 is too old)
+RUN pip install --user grpcio-tools

 COPY --chmod=644 custom-ca-certs/* /usr/local/share/ca-certificates/
 RUN update-ca-certificates

-RUN test -n "$TARGETARCH" \
-    || (echo 'warn: missing $TARGETARCH, either set this `ARG` manually, or run using `docker buildkit`')
-
 # Use the variables in subsequent instructions
 RUN echo "Target Architecture: $TARGETARCH"
 RUN echo "Target Variant: $TARGETVARIANT"

+# CuBLAS requirements
+RUN if [ "${BUILD_TYPE}" = "cublas" ]; then \
+    apt-get install -y software-properties-common && \
+    curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
+    dpkg -i cuda-keyring_1.1-1_all.deb && \
+    rm -f cuda-keyring_1.1-1_all.deb && \
+    apt-get update && \
+    apt-get install -y cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}  && apt-get clean \
+    ; fi

+# Cuda
+ENV PATH /usr/local/cuda/bin:${PATH}

+# HipBLAS requirements
+ENV PATH /opt/rocm/bin:${PATH}
+
+# OpenBLAS requirements and stable diffusion
+RUN apt-get install -y \
+    libopenblas-dev \
+    libopencv-dev \ 
+    && apt-get clean
+
+# Set up OpenCV
+RUN ln -s /usr/include/opencv4/opencv2 /usr/include/opencv2

 WORKDIR /build

+RUN test -n "$TARGETARCH" \
+    || (echo 'warn: missing $TARGETARCH, either set this `ARG` manually, or run using `docker buildkit`')

 ###################################
 ###################################

-# Temporary workaround for Intel's repository to work correctly
-# https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/APT-Repository-not-working-signatures-invalid/m-p/1599436/highlight/true#M36143
-# This is a temporary workaround until Intel fixes their repository
-FROM ${INTEL_BASE_IMAGE} AS intel
-ARG UBUNTU_CODENAME=noble
-RUN wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
-gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
-RUN echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu ${UBUNTU_CODENAME}/lts/2350 unified" > /etc/apt/sources.list.d/intel-graphics.list
+FROM requirements-core as requirements-extras
+
+RUN apt install -y gpg && \
+    curl https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc | gpg --dearmor > conda.gpg && \
+    install -o root -g root -m 644 conda.gpg /usr/share/keyrings/conda-archive-keyring.gpg && \
+    gpg --keyring /usr/share/keyrings/conda-archive-keyring.gpg --no-default-keyring --fingerprint 34161F5BF5EB1D4BFBBB8F0A8AEB4F8B29D82806 && \
+    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" > /etc/apt/sources.list.d/conda.list && \
+    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/conda-archive-keyring.gpg] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main" | tee -a /etc/apt/sources.list.d/conda.list && \
+    apt-get update && \
+    apt-get install -y conda && apt-get clean
+
+ENV PATH="/root/.cargo/bin:${PATH}"
+RUN apt-get install -y python3-pip && apt-get clean
+RUN pip install --upgrade pip
+
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+RUN apt-get install -y espeak-ng espeak && apt-get clean
+
+RUN if [ ! -e /usr/bin/python ]; then \
+	  ln -s /usr/bin/python3 /usr/bin/python \
+    ; fi
+
+###################################
+###################################
+
+FROM ${BASE_IMAGE} as grpc
+
+ARG MAKEFLAGS
+ARG GRPC_VERSION=v1.58.0
+
+ENV MAKEFLAGS=${MAKEFLAGS}
+
+WORKDIR /build
+
 RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        intel-oneapi-runtime-libs && \
+    apt-get install -y build-essential cmake git  && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

+RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc
+
+RUN cd grpc && \
+    mkdir -p cmake/build && \
+    cd cmake/build && \
+    cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF ../.. && \
+    make
+
 ###################################
 ###################################

-# The builder-base target has the arguments, variables, and copies shared between full builder images and the uncompiled devcontainer
+FROM requirements-${IMAGE_TYPE} as builder

-FROM build-requirements AS builder-base
-
-ARG GO_TAGS="auth"
+ARG GO_TAGS="stablediffusion tts"
 ARG GRPC_BACKENDS
 ARG MAKEFLAGS
-ARG LD_FLAGS="-s -w"
-ARG TARGETARCH
-ARG TARGETVARIANT
+
 ENV GRPC_BACKENDS=${GRPC_BACKENDS}
 ENV GO_TAGS=${GO_TAGS}
 ENV MAKEFLAGS=${MAKEFLAGS}
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 ENV NVIDIA_REQUIRE_CUDA="cuda>=${CUDA_MAJOR_VERSION}.0"
 ENV NVIDIA_VISIBLE_DEVICES=all
-ENV LD_FLAGS=${LD_FLAGS}
-
-RUN echo "GO_TAGS: $GO_TAGS" && echo "TARGETARCH: $TARGETARCH"
-
-WORKDIR /build
-
-
-# We need protoc installed, and the version in 22.04 is too old.
-RUN <<EOT bash
-    if [ "amd64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-    if [ "arm64" = "$TARGETARCH" ]; then
-        curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
-        unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
-        rm protoc.zip
-    fi
-EOT
-
-###################################
-###################################
-
-# Build React UI
-FROM node:25-slim AS react-ui-builder
-WORKDIR /app
-COPY core/http/react-ui/package*.json ./
-RUN npm install
-COPY core/http/react-ui/ ./
-RUN npm run build
-
-###################################
-###################################
-
-# Compile backends first in a separate stage
-FROM builder-base AS builder-backends
-ARG TARGETARCH
-ARG TARGETVARIANT
-
-WORKDIR /build
-
-COPY ./Makefile .
-COPY ./backend ./backend
-COPY ./go.mod .
-COPY ./go.sum .
-COPY ./.git ./.git
-
-# Some of the Go backends use libs from the main src, we could further optimize the caching by building the CPP backends before here
-COPY ./pkg/grpc ./pkg/grpc
-COPY ./pkg/utils ./pkg/utils
-
-RUN ls -l ./
-RUN make protogen-go
-
-# The builder target compiles LocalAI. This target is not the target that will be uploaded to the registry.
-# Adjustments to the build process should likely be made here.
-FROM builder-backends AS builder

 WORKDIR /build

 COPY . .
-
-# Copy pre-built React UI
-COPY --from=react-ui-builder /app/dist ./core/http/react-ui/dist
-
-## Build the binary
-## If we're on arm64 AND using cublas/hipblas, skip some of the llama-compat backends to save space
-## Otherwise just run the normal build
-RUN make build
-
-###################################
-###################################
-
-# The devcontainer target is not used on CI. It is a target for developers to use locally -
-# rather than copying files it mounts them locally and leaves building to the developer
-
-FROM builder-base AS devcontainer
-
-COPY .devcontainer-scripts /.devcontainer-scripts
+COPY .git .
+RUN echo "GO_TAGS: $GO_TAGS"

 RUN apt-get update && \
-    apt-get install -y --no-install-recommends \
-        ssh less
-# For the devcontainer, leave apt functional in case additional devtools are needed at runtime.
+    apt-get install -y build-essential cmake git  && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*

-RUN go install github.com/go-delve/delve/cmd/dlv@latest
+RUN make prepare

-RUN go install github.com/mikefarah/yq/v4@latest
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
+    apt-get update && \
+    apt-get install -y libclblast-dev && \
+    apt-get clean \
+    ; fi
+
+# stablediffusion does not tolerate a newer version of abseil, build it first
+RUN GRPC_BACKENDS=backend-assets/grpc/stablediffusion make build
+
+COPY --from=grpc /build/grpc ./grpc/
+
+RUN cd /build/grpc/cmake/build && make install
+
+# Rebuild with defaults backends
+RUN make build
+
+RUN if [ ! -d "/build/sources/go-piper/piper-phonemize/pi/lib/" ]; then \
+    mkdir -p /build/sources/go-piper/piper-phonemize/pi/lib/ \
+    touch /build/sources/go-piper/piper-phonemize/pi/lib/keep \
+    ; fi

 ###################################
 ###################################

-# This is the final target. The result of this target will be the image uploaded to the registry.
-# If you cannot find a more suitable place for an addition, this layer is a suitable place for it.
-FROM requirements-drivers
+FROM requirements-${IMAGE_TYPE}

+ARG FFMPEG
+ARG BUILD_TYPE
+ARG TARGETARCH
+ARG IMAGE_TYPE=extras
+ARG MAKEFLAGS
+
+ENV BUILD_TYPE=${BUILD_TYPE}
+ENV REBUILD=false
 ENV HEALTHCHECK_ENDPOINT=http://localhost:8080/readyz
+ENV MAKEFLAGS=${MAKEFLAGS}

-ARG CUDA_MAJOR_VERSION=12
+ARG CUDA_MAJOR_VERSION=11
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 ENV NVIDIA_REQUIRE_CUDA="cuda>=${CUDA_MAJOR_VERSION}.0"
 ENV NVIDIA_VISIBLE_DEVICES=all
+ENV PIP_CACHE_PURGE=true

-WORKDIR /
+# Add FFmpeg
+RUN if [ "${FFMPEG}" = "true" ]; then \
+    apt-get install -y ffmpeg && apt-get clean \
+    ; fi

-COPY ./entrypoint.sh .
+# Add OpenCL
+RUN if [ "${BUILD_TYPE}" = "clblas" ]; then \
+    apt-get update && \
+    apt-get install -y libclblast1 && \
+    apt-get clean \
+    ; fi
+
+RUN apt-get update && \
+    apt-get install -y cmake git  && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+WORKDIR /build
+
+# we start fresh & re-copy all assets because `make build` does not clean up nicely after itself
+# so when `entrypoint.sh` runs `make build` again (which it does by default), the build would fail
+# see https://github.com/go-skynet/LocalAI/pull/658#discussion_r1241971626 and
+# https://github.com/go-skynet/LocalAI/pull/434
+COPY . .
+
+COPY --from=builder /build/sources ./sources/
+COPY --from=grpc /build/grpc ./grpc/
+
+RUN make prepare-sources && cd /build/grpc/cmake/build && make install && rm -rf /build/grpc

 # Copy the binary
 COPY --from=builder /build/local-ai ./
-# Copy the opus shim if it was built
-RUN --mount=from=builder,src=/build/,dst=/mnt/build \
-    if [ -f /mnt/build/libopusshim.so ]; then cp /mnt/build/libopusshim.so ./; fi
+
+# Copy shared libraries for piper
+COPY --from=builder /build/sources/go-piper/piper-phonemize/pi/lib/* /usr/lib/
+
+# do not let stablediffusion rebuild (requires an older version of absl)
+COPY --from=builder /build/backend-assets/grpc/stablediffusion ./backend-assets/grpc/stablediffusion
+
+## Duplicated from Makefile to avoid having a big layer that's hard to push
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/autogptq \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/bark \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/diffusers \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/vllm \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/mamba \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/sentencetransformers \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/transformers \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/vall-e-x \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/exllama \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/exllama2 \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/petals \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/transformers-musicgen \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/parler-tts \
+    ; fi
+RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
+    make -C backend/python/coqui \
+    ; fi

 # Make sure the models directory exists
-RUN mkdir -p /models /backends /data
+RUN mkdir -p /build/models

 # Define the health check command
 HEALTHCHECK --interval=1m --timeout=10m --retries=10 \
-  CMD curl -f ${HEALTHCHECK_ENDPOINT} || exit 1
-
-VOLUME /models /backends /configuration /data
+  CMD curl -f $HEALTHCHECK_ENDPOINT || exit 1
+  
+VOLUME /build/models
 EXPOSE 8080
-ENTRYPOINT [ "/entrypoint.sh" ]
+ENTRYPOINT [ "/build/entrypoint.sh" ]
--- a/Dockerfile.aio
+++ b/Dockerfile.aio
@@ -0,0 +1,8 @@
+ARG BASE_IMAGE=ubuntu:22.04
+
+FROM ${BASE_IMAGE} 
+
+RUN apt-get update && apt-get install -y pciutils && apt-get clean
+
+COPY aio/ /aio
+ENTRYPOINT [ "/aio/entrypoint.sh" ]
--- a/5
+++ b/5
@@ -0,0 +1,5 @@
+VERSION 0.7
+
+build:
+    FROM DOCKERFILE -f Dockerfile .
+    SAVE ARTIFACT /usr/bin/local-ai AS LOCAL local-ai
--- a/2
+++ b/2
@@ -1,6 +1,6 @@
 MIT License

-Copyright (c) 2023-2025 Ettore Di Giacinto (mudler@localai.io)
+Copyright (c) 2023-2024 Ettore Di Giacinto (mudler@localai.io)

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/1094
+++ b/1094
--- a/README.md
+++ b/README.md
@@ -1,189 +1,138 @@
 <h1 align="center">
  <br>
-  <img width="300" src="./core/http/static/logo.png"> <br>
+  <img height="300" src="https://github.com/go-skynet/LocalAI/assets/2420543/0966aa2a-166e-4f99-a3e5-6c915fc997dd"> <br>
+    LocalAI
 <br>
 </h1>

 <p align="center">
+<a href="https://github.com/go-skynet/LocalAI/fork" target="blank">
+<img src="https://img.shields.io/github/forks/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI forks"/>
+</a>
 <a href="https://github.com/go-skynet/LocalAI/stargazers" target="blank">
 <img src="https://img.shields.io/github/stars/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI stars"/>
 </a>
+<a href="https://github.com/go-skynet/LocalAI/pulls" target="blank">
+<img src="https://img.shields.io/github/issues-pr/go-skynet/LocalAI?style=for-the-badge" alt="LocalAI pull-requests"/>
+</a>
 <a href='https://github.com/go-skynet/LocalAI/releases'>
 <img src='https://img.shields.io/github/release/go-skynet/LocalAI?&label=Latest&style=for-the-badge'>
 </a>
-<a href="LICENSE" target="blank">
-<img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge" alt="LocalAI License"/>
+</p>
+
+<p align="center">
+<a href="https://hub.docker.com/r/localai/localai" target="blank">
+<img src="https://img.shields.io/badge/dockerhub-images-important.svg?logo=Docker" alt="LocalAI Docker hub"/>
+</a>
+<a href="https://quay.io/repository/go-skynet/local-ai?tab=tags&tag=latest" target="blank">
+<img src="https://img.shields.io/badge/quay.io-images-important.svg?" alt="LocalAI Quay.io"/>
 </a>
 </p>

 <p align="center">
 <a href="https://twitter.com/LocalAI_API" target="blank">
-<img src="https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white&label=LocalAI_API" alt="Follow LocalAI_API"/>
+<img src="https://img.shields.io/twitter/follow/LocalAI_API?label=Follow: LocalAI_API&style=social" alt="Follow LocalAI_API"/>
 </a>
 <a href="https://discord.gg/uJAeKSAGDy" target="blank">
-<img src="https://img.shields.io/badge/dynamic/json?color=blue&label=Discord&style=for-the-badge&query=approximate_member_count&url=https%3A%2F%2Fdiscordapp.com%2Fapi%2Finvites%2FuJAeKSAGDy%3Fwith_counts%3Dtrue&logo=discord" alt="Join LocalAI Discord Community"/>
+<img src="https://dcbadge.vercel.app/api/server/uJAeKSAGDy?style=flat-square&theme=default-inverted" alt="Join LocalAI Discord Community"/>
 </a>
 </p>

-<p align="center">
-<a href="https://trendshift.io/repositories/5539" target="_blank"><img src="https://trendshift.io/api/badge/repositories/5539" alt="mudler%2FLocalAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
-</p>
+> :bulb: Get help - [❓FAQ](https://localai.io/faq/) [💭Discussions](https://github.com/go-skynet/LocalAI/discussions) [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) [:book: Documentation website](https://localai.io/)
+>
+> [💻 Quickstart](https://localai.io/basics/getting_started/) [📣 News](https://localai.io/basics/news/) [ 🛫 Examples ](https://github.com/go-skynet/LocalAI/tree/master/examples/) [ 🖼️ Models ](https://localai.io/models/) [ 🚀 Roadmap ](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)

-**LocalAI** is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
+[![tests](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/test.yml)[![Build and Release](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/release.yaml)[![build container images](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/image.yml)[![Bump dependencies](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml/badge.svg)](https://github.com/go-skynet/LocalAI/actions/workflows/bump_deps.yaml)[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/localai)](https://artifacthub.io/packages/search?repo=localai)

- **Drop-in API compatibility** — OpenAI, Anthropic, ElevenLabs APIs
- **35+ backends** — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
- **Any hardware** — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
- **Multi-user ready** — API key auth, user quotas, role-based access
- **Built-in AI agents** — autonomous agents with tool use, RAG, MCP, and skills
- **Privacy-first** — your data never leaves your infrastructure
+**LocalAI** is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU.

-Created and maintained by [Ettore Di Giacinto](https://github.com/mudler).
+## 🔥🔥 Hot topics / Roadmap

-> [:book: Documentation](https://localai.io/) | [:speech_balloon: Discord](https://discord.gg/uJAeKSAGDy) | [💻 Quickstart](https://localai.io/basics/getting_started/) | [🖼️ Models](https://models.localai.io/) | [❓FAQ](https://localai.io/faq/)
+[Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap)

-## Screenshots
+- Parler-TTS: https://github.com/mudler/LocalAI/pull/2027
+- Landing page: https://github.com/mudler/LocalAI/pull/1922
+- Openvino support: https://github.com/mudler/LocalAI/pull/1892
+- Vector store: https://github.com/mudler/LocalAI/pull/1795
+- All-in-one container image: https://github.com/mudler/LocalAI/issues/1855
+- Parallel function calling: https://github.com/mudler/LocalAI/pull/1726 / Tools API support: https://github.com/mudler/LocalAI/pull/1715

-### Chat, Model gallery
+Hot topics (looking for contributors):
+- Backends v2: https://github.com/mudler/LocalAI/issues/1126
+- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
+- Assistant API: https://github.com/mudler/LocalAI/issues/1273
+- Moderation endpoint: https://github.com/mudler/LocalAI/issues/999
+- Vulkan: https://github.com/mudler/LocalAI/issues/1647

-https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18
+If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22

-### Agents
+## 💻 [Getting started](https://localai.io/basics/getting_started/index.html)

-https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a
+For a detailed step-by-step introduction, refer to the [Getting Started](https://localai.io/basics/getting_started/index.html) guide. 

-## Quickstart
-
-### macOS
-
-<a href="https://github.com/mudler/LocalAI/releases/latest/download/LocalAI.dmg">
-  <img src="https://img.shields.io/badge/Download-macOS-blue?style=for-the-badge&logo=apple&logoColor=white" alt="Download LocalAI for macOS"/>
-</a>
-
-> **Note:** The DMG is not signed by Apple. After installing, run: `sudo xattr -d com.apple.quarantine /Applications/LocalAI.app`. See [#6268](https://github.com/mudler/LocalAI/issues/6268) for details.
-
-### Containers (Docker, podman, ...)
-
-> Already ran LocalAI before? Use `docker start -i local-ai` to restart an existing container.
-
-#### CPU only:
+For those in a hurry, here's a straightforward one-liner to launch a LocalAI AIO(All-in-one) Image using `docker`:

 ```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
+docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
+# or, if you have an Nvidia GPU:
+# docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12
 ```

-#### NVIDIA GPU:
+## 🚀 [Features](https://localai.io/features/)

-```bash
-# CUDA 13
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13
+- 📖 [Text generation with GPTs](https://localai.io/features/text-generation/) (`llama.cpp`, `gpt4all.cpp`, ... [:book: and more](https://localai.io/model-compatibility/index.html#model-compatibility-table))
+- 🗣 [Text to Audio](https://localai.io/features/text-to-audio/)
+- 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`)
+- 🎨 [Image generation with stable diffusion](https://localai.io/features/image-generation)
+- 🔥 [OpenAI functions](https://localai.io/features/openai-functions/) 🆕
+- 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/)
+- ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/)
+- 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/)
+- 🆕 [Vision API](https://localai.io/features/gpt-vision/)

-# CUDA 12
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
+## 💻 Usage

-# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64
+Check out the [Getting started](https://localai.io/basics/getting_started/index.html) section in our documentation.

-# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
-docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13
-```
+### 🔗 Community and integrations

-#### AMD GPU (ROCm):
+Build and deploy custom containers:
+- https://github.com/sozercan/aikit

-```bash
-docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas
-```
+WebUIs:
+- https://github.com/Jirubizu/localai-admin
+- https://github.com/go-skynet/LocalAI-frontend

-#### Intel GPU (oneAPI):
+Model galleries
+- https://github.com/go-skynet/model-gallery

-```bash
-docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel
-```
+Other:
+- Helm chart https://github.com/go-skynet/helm-charts
+- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
+- Local Smart assistant https://github.com/mudler/LocalAGI
+- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation
+- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
+- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
+- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
+- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
+  

-#### Vulkan GPU:
+### 🔗 Resources

-```bash
-docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan
-```
+- 🆕 New! [LLM finetuning guide](https://localai.io/docs/advanced/fine-tuning/)
+- [How to build locally](https://localai.io/basics/build/index.html)
+- [How to install in Kubernetes](https://localai.io/basics/getting_started/index.html#run-localai-in-kubernetes)
+- [Projects integrating LocalAI](https://localai.io/docs/integrations/)
+- [How tos section](https://io.midori-ai.xyz/howtos/) (curated by our community)

-### Loading models
+## :book: 🎥 [Media, Blogs, Social](https://localai.io/basics/news/#media-blogs-social)

-```bash
-# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
-local-ai run llama-3.2-1b-instruct:q4_k_m
-# From Huggingface
-local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
-# From the Ollama OCI registry
-local-ai run ollama://gemma:2b
-# From a YAML config
-local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
-# From a standard OCI registry (e.g., Docker Hub)
-local-ai run oci://localai/phi-2:latest
-```
-
-> **Automatic Backend Detection**: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see [GPU Acceleration](https://localai.io/features/gpu-acceleration/).
-
-For more details, see the [Getting Started guide](https://localai.io/basics/getting_started/).
-
-## Latest News
-
- **March 2026**: [Agent management](https://github.com/mudler/LocalAI/pull/8820), [New React UI](https://github.com/mudler/LocalAI/pull/8772), [WebRTC](https://github.com/mudler/LocalAI/pull/8790), [MLX-distributed via P2P and RDMA](https://github.com/mudler/LocalAI/pull/8801), [MCP Apps, MCP Client-side](https://github.com/mudler/LocalAI/pull/8947)
- **February 2026**: [Realtime API for audio-to-audio with tool calling](https://github.com/mudler/LocalAI/pull/6245), [ACE-Step 1.5 support](https://github.com/mudler/LocalAI/pull/8396)
- **January 2026**: **LocalAI 3.10.0** — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. [Release notes](https://github.com/mudler/LocalAI/releases/tag/v3.10.0)
- **December 2025**: [Dynamic Memory Resource reclaimer](https://github.com/mudler/LocalAI/pull/7583), [Automatic multi-GPU model fitting (llama.cpp)](https://github.com/mudler/LocalAI/pull/7584), [Vibevoice backend](https://github.com/mudler/LocalAI/pull/7494)
- **November 2025**: [Import models via URL](https://github.com/mudler/LocalAI/pull/7245), [Multiple chats and history](https://github.com/mudler/LocalAI/pull/7325)
- **October 2025**: [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/) support for agentic capabilities
- **September 2025**: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
- **August 2025**: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
- **July 2025**: All backends migrated outside the main binary — [lightweight, modular architecture](https://github.com/mudler/LocalAI/releases/tag/v3.2.0)
-
-For older news and full release notes, see [GitHub Releases](https://github.com/mudler/LocalAI/releases) and the [News page](https://localai.io/basics/news/).
-
-## Features
-
- [Text generation](https://localai.io/features/text-generation/) (`llama.cpp`, `transformers`, `vllm` ... [and more](https://localai.io/model-compatibility/))
- [Text to Audio](https://localai.io/features/text-to-audio/)
- [Audio to Text](https://localai.io/features/audio-to-text/)
- [Image generation](https://localai.io/features/image-generation)
- [OpenAI-compatible tools API](https://localai.io/features/openai-functions/)
- [Realtime API](https://localai.io/features/openai-realtime/) (Speech-to-speech)
- [Embeddings generation](https://localai.io/features/embeddings/)
- [Constrained grammars](https://localai.io/features/constrained_grammars/)
- [Download models from Huggingface](https://localai.io/models/)
- [Vision API](https://localai.io/features/gpt-vision/)
- [Object Detection](https://localai.io/features/object-detection/)
- [Reranker API](https://localai.io/features/reranker/)
- [P2P Inferencing](https://localai.io/features/distribute/)
- [Distributed Mode](https://localai.io/features/distributed-mode/) — Horizontal scaling with PostgreSQL + NATS
- [Model Context Protocol (MCP)](https://localai.io/docs/features/mcp/)
- [Built-in Agents](https://localai.io/features/agents/) — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and [Agent Hub](https://agenthub.localai.io)
- [Backend Gallery](https://localai.io/backends/) — Install/remove backends on the fly via OCI images
- Voice Activity Detection (Silero-VAD)
- Integrated WebUI
-
-## Supported Backends & Acceleration
-
-LocalAI supports **35+ backends** including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for **NVIDIA** (CUDA 12/13), **AMD** (ROCm), **Intel** (oneAPI/SYCL), **Apple Silicon** (Metal), **Vulkan**, and **NVIDIA Jetson** (L4T). All backends can be installed on-the-fly from the [Backend Gallery](https://localai.io/backends/).
-
-See the full [Backend & Model Compatibility Table](https://localai.io/model-compatibility/) and [GPU Acceleration guide](https://localai.io/features/gpu-acceleration/).
-
-## Resources
-
- [Documentation](https://localai.io/)
- [LLM fine-tuning guide](https://localai.io/docs/advanced/fine-tuning/)
- [Build from source](https://localai.io/basics/build/)
- [Kubernetes installation](https://localai.io/basics/getting_started/#run-localai-in-kubernetes)
- [Integrations & community projects](https://localai.io/docs/integrations/)
- [Media & blog posts](https://localai.io/basics/news/#media-blogs-social)
- [Examples](https://github.com/mudler/LocalAI-examples)
-
-## Autonomous Development Team
-
-LocalAI is helped being maintained by a team of autonomous AI agents led by an AI Scrum Master.
-
- **Live Reports**: [reports.localai.io](http://reports.localai.io)
- **Project Board**: [Agent task tracking](https://github.com/users/mudler/projects/6)
- **Blog Post**: [Learn about the experiment](https://mudler.pm/posts/2026/02/28/a-call-to-open-source-maintainers-stop-babysitting-ai-how-i-built-a-100-local-autonomous-dev-team-to-maintain-localai-and-why-you-should-too/)
+- [Run LocalAI on AWS EKS with Pulumi](https://www.pulumi.com/ai/answers/tiZMDoZzZV6TLxgDXNBnFE/deploying-helm-charts-on-aws-eks)
+- [Run LocalAI on AWS](https://staleks.hashnode.dev/installing-localai-on-aws-ec2-instance)
+- [Create a slackbot for teams and OSS projects that answer to documentation](https://mudler.pm/posts/smart-slackbot-for-teams/)
+- [LocalAI meets k8sgpt](https://www.youtube.com/watch?v=PKrDNuJ_dfE)
+- [Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All](https://mudler.pm/posts/localai-question-answering/)
+- [Tutorial to use k8sgpt with LocalAI](https://medium.com/@tyler_97636/k8sgpt-localai-unlock-kubernetes-superpowers-for-free-584790de9b65)

 ## Citation

@@ -199,38 +148,35 @@ If you utilize this repository, data in a downstream project, please consider ci
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},
 ```

-## Sponsors
+## ❤️ Sponsors

 > Do you find LocalAI useful?

 Support the project by becoming [a backer or sponsor](https://github.com/sponsors/mudler). Your logo will show up here with a link to your website.

-A huge thank you to our generous sponsors who support this project covering CI expenses, and our [Sponsor list](https://github.com/sponsors/mudler):
+A huge thank you to our generous sponsors who support this project:

-<p align="center">
-  <a href="https://www.spectrocloud.com/" target="blank">
-    <img height="200" src="https://github.com/user-attachments/assets/72eab1dd-8b93-4fc0-9ade-84db49f24962">
-  </a>
-  <a href="https://www.premai.io/" target="blank">
-    <img height="200" src="https://github.com/mudler/LocalAI/assets/2420543/42e4ca83-661e-4f79-8e46-ae43689683d6"> <br>
-  </a>
-</p>
+| ![Spectro Cloud logo_600x600px_transparent bg](https://github.com/go-skynet/LocalAI/assets/2420543/68a6f3cb-8a65-4a4d-99b5-6417a8905512) |
+|:-----------------------------------------------:|
+|  [Spectro Cloud](https://www.spectrocloud.com/)  |
+|  Spectro Cloud kindly supports LocalAI by providing GPU and computing resources to run tests on lamdalabs!  |

-### Individual sponsors
+And a huge shout-out to individuals sponsoring the project by donating hardware or backing the project.

-A special thanks to individual sponsors, a full list is on [GitHub](https://github.com/sponsors/mudler) and [buymeacoffee](https://buymeacoffee.com/mudler). Special shout out to [drikster80](https://github.com/drikster80) for being generous. Thank you everyone!
+- [Sponsor list](https://github.com/sponsors/mudler)
+- JDAM00 (donating HW for the CI)

-## Star history
+## 🌟 Star history

 [![LocalAI Star history Chart](https://api.star-history.com/svg?repos=go-skynet/LocalAI&type=Date)](https://star-history.com/#go-skynet/LocalAI&Date)

-## License
+## 📖 License

 LocalAI is a community-driven project created by [Ettore Di Giacinto](https://github.com/mudler/).

-MIT - Author Ettore Di Giacinto <mudler@localai.io>
+MIT - Author Ettore Di Giacinto

-## Acknowledgements
+## 🙇 Acknowledgements

 LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

@@ -240,12 +186,12 @@ LocalAI couldn't have been built without the help of great software already avai
 - https://github.com/antimatter15/alpaca.cpp
 - https://github.com/EdVince/Stable-Diffusion-NCNN
 - https://github.com/ggerganov/whisper.cpp
+- https://github.com/saharNooby/rwkv.cpp
 - https://github.com/rhasspy/piper
- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation

-## Contributors
+## 🤗 Contributors

-This is a community project, a special thanks to our contributors!
+This is a community project, a special thanks to our contributors! 🤗
 <a href="https://github.com/go-skynet/LocalAI/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=go-skynet/LocalAI" />
 </a>
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -8,24 +8,10 @@ At LocalAI, we take the security of our software seriously. We understand the im

 We provide support and updates for certain versions of our software. The following table outlines which versions are currently supported with security updates:

-| Version Series | Support Level | Details |
-| -------------- | ------------- | ------- |
-| 3.x | :white_check_mark: Actively supported | Full security updates and bug fixes for the latest minor versions. |
-| 2.x | :warning: Security fixes only | Critical security patches only, until **December 31, 2025**. |
-| 1.x | :x: End-of-life (EOL) | No longer supported as of **January 1, 2024**. No security fixes will be provided. |
-
-### What each support level means
-
- **Actively supported (3.x):** Receives all security updates, bug fixes, and new features. Users should stay on the latest 3.x minor release for the best protection.
- **Security fixes only (2.x):** Receives only critical security patches (e.g., remote code execution, authentication bypass, data exposure). No bug fixes or new features. Support ends December 31, 2025.
- **End-of-life (1.x):** No updates of any kind. Users on 1.x are strongly encouraged to upgrade immediately, as known vulnerabilities will not be patched.
-
-### Migrating from older versions
-
-If you are running an unsupported or soon-to-be-unsupported version, we recommend upgrading as soon as possible:
-
- **From 1.x to 3.x:** Version 1.x reached end-of-life on January 1, 2024. Review the [release notes](https://github.com/mudler/LocalAI/releases) for breaking changes across major versions, and upgrade directly to the latest 3.x release.
- **From 2.x to 3.x:** While 2.x still receives critical security patches until December 31, 2025, we recommend planning your migration to 3.x to benefit from ongoing improvements and full support.
+| Version | Supported          |
+| ------- | ------------------ |
+| > 2.0   | :white_check_mark: |
+| < 2.0   | :x:                |

 Please ensure that you are using a supported version to receive the latest security updates.

--- a/aio/cpu/README.md
+++ b/aio/cpu/README.md
@@ -0,0 +1,5 @@
+## AIO CPU size
+
+Use this image with CPU-only.
+
+Please keep using only C++ backends so the base image is as small as possible (without CUDA, cuDNN, python, etc).
--- a/aio/cpu/embeddings.yaml
+++ b/aio/cpu/embeddings.yaml
@@ -0,0 +1,12 @@
+name: text-embedding-ada-002
+backend: bert-embeddings
+parameters:
+  model: huggingface://mudler/all-MiniLM-L6-v2/ggml-model-q4_0.bin
+
+usage: |
+    You can test this model with curl like this:
+
+    curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
+      "input": "Your text string goes here",
+      "model": "text-embedding-ada-002"
+    }'
--- a/aio/cpu/image-gen.yaml
+++ b/aio/cpu/image-gen.yaml
@@ -0,0 +1,62 @@
+name: stablediffusion
+backend: stablediffusion
+parameters:
+  model: stablediffusion_assets
+
+license: "BSD-3"
+urls:
+- https://github.com/EdVince/Stable-Diffusion-NCNN
+- https://github.com/EdVince/Stable-Diffusion-NCNN/blob/main/LICENSE
+
+description: |
+     Stable Diffusion in NCNN with c++, supported txt2img and img2img
+
+download_files:
+- filename: "stablediffusion_assets/AutoencoderKL-256-256-fp16-opt.param"
+  sha256: "18ca4b66685e21406bcf64c484b3b680b4949900415536d599cc876579c85c82"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param"
+- filename: "stablediffusion_assets/AutoencoderKL-512-512-fp16-opt.param"
+  sha256: "cf45f63aacf3dbbab0f59ed92a6f2c14d9a1801314631cd3abe91e3c85639a20"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param"
+- filename: "stablediffusion_assets/AutoencoderKL-base-fp16.param"
+  sha256: "0254a056dce61b0c27dc9ec1b78b53bcf55315c540f55f051eb841aa992701ba"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param"
+- filename: "stablediffusion_assets/AutoencoderKL-encoder-512-512-fp16.bin"
+  sha256: "ddcb79a9951b9f91e05e087739ed69da2c1c4ae30ba4168cce350b49d617c9fa"
+  uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-encoder-512-512-fp16.bin"
+- filename: "stablediffusion_assets/AutoencoderKL-fp16.bin"
+  sha256: "f02e71f80e70252734724bbfaed5c4ddd3a8ed7e61bb2175ff5f53099f0e35dd"
+  uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/AutoencoderKL-fp16.bin"
+- filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.bin"
+  sha256: "1c9a12f4e1dd1b295a388045f7f28a2352a4d70c3dc96a542189a3dd7051fdd6"
+  uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/FrozenCLIPEmbedder-fp16.bin"
+- filename: "stablediffusion_assets/FrozenCLIPEmbedder-fp16.param"
+  sha256: "471afbe678dd1fd3fe764ef9c6eccaccb0a7d7e601f27b462aa926b20eb368c9"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param"
+- filename: "stablediffusion_assets/log_sigmas.bin"
+  sha256: "a2089f8aa4c61f9c200feaec541ab3f5c94233b28deb6d5e8bcd974fa79b68ac"
+  uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin"
+- filename: "stablediffusion_assets/UNetModel-256-256-MHA-fp16-opt.param"
+  sha256: "a58c380229f09491776df837b7aa7adffc0a87821dc4708b34535da2e36e3da1"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param"
+- filename: "stablediffusion_assets/UNetModel-512-512-MHA-fp16-opt.param"
+  sha256: "f12034067062827bd7f43d1d21888d1f03905401acf6c6eea22be23c259636fa"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param"
+- filename: "stablediffusion_assets/UNetModel-base-MHA-fp16.param"
+  sha256: "696f6975de49f4325b53ce32aff81861a6d6c07cd9ce3f0aae2cc405350af38d"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param"
+- filename: "stablediffusion_assets/UNetModel-MHA-fp16.bin"
+  sha256: "d618918d011bfc1f644c0f2a33bf84931bd53b28a98492b0a8ed6f3a818852c3"
+  uri: "https://github.com/EdVince/Stable-Diffusion-NCNN/releases/download/naifu/UNetModel-MHA-fp16.bin"
+- filename: "stablediffusion_assets/vocab.txt"
+  sha256: "e30e57b6f1e47616982ef898d8922be24e535b4fa3d0110477b3a6f02ebbae7d"
+  uri: "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt"
+
+usage: |
+        curl http://localhost:8080/v1/images/generations \
+          -H "Content-Type: application/json" \
+          -d '{
+            "prompt": "<positive prompt>|<negative prompt>",
+            "step": 25,
+            "size": "512x512"
+          }'
--- a/aio/cpu/speech-to-text.yaml
+++ b/aio/cpu/speech-to-text.yaml
@@ -0,0 +1,18 @@
+name: whisper-1
+backend: whisper
+parameters:
+  model: ggml-whisper-base.bin
+
+usage: |
+    ## example audio file
+    wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
+
+    ## Send the example audio file to the transcriptions endpoint
+    curl http://localhost:8080/v1/audio/transcriptions \
+         -H "Content-Type: multipart/form-data" \
+         -F file="@$PWD/gb1.ogg" -F model="whisper-1"
+
+download_files:
+- filename: "ggml-whisper-base.bin"
+  sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
+  uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
--- a/aio/cpu/text-to-speech.yaml
+++ b/aio/cpu/text-to-speech.yaml
@@ -0,0 +1,15 @@
+name: tts-1
+download_files:
+  - filename: voice-en-us-amy-low.tar.gz
+    uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
+
+parameters:
+  model: en-us-amy-low.onnx
+
+usage: |
+    To test if this model works as expected, you can use the following curl command:
+
+    curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
+      "model":"voice-en-us-amy-low",
+      "input": "Hi, this is a test."
+    }'
--- a/aio/cpu/text-to-text.yaml
+++ b/aio/cpu/text-to-text.yaml
@@ -0,0 +1,53 @@
+name: gpt-4
+mmap: true
+parameters:
+  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q2_K.gguf
+
+template:
+  chat_message: |
+    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
+    {{- if .FunctionCall }}<tool_call>{{end}}
+    {{- if eq .RoleName "tool" }}<tool_result>{{end }}
+    {{- if .Content}}
+    {{.Content}}
+    {{- end }}
+    {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
+    {{- if .FunctionCall }}</tool_call>{{end }}
+    {{- if eq .RoleName "tool" }}</tool_result>{{end }}
+    <|im_end|>
+  # https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
+  function: |
+    <|im_start|>system
+    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
+    <tools>
+    {{range .Functions}}
+    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
+    {{end}}
+    </tools>
+    Use the following pydantic model json schema for each tool call you will make:
+    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
+    For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
+    <tool_call>
+    {'arguments': <args-dict>, 'name': <function-name>}
+    </tool_call>
+    <|im_end|>
+    {{.Input -}}
+    <|im_start|>assistant
+    <tool_call>
+  chat: |
+    {{.Input -}}
+    <|im_start|>assistant
+  completion: |
+    {{.Input}}
+context_size: 4096
+f16: true
+stopwords:
+- <|im_end|>
+- <dummy32000>
+- "\n</tool_call>"
+- "\n\n\n"
+usage: |
+      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+          "model": "gpt-4",
+          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
+      }'
--- a/aio/cpu/vision.yaml
+++ b/aio/cpu/vision.yaml
@@ -0,0 +1,31 @@
+backend: llama-cpp
+context_size: 4096
+f16: true
+mmap: true
+name: gpt-4-vision-preview
+
+roles:
+  user: "USER:"
+  assistant: "ASSISTANT:"
+  system: "SYSTEM:"
+
+mmproj: bakllava-mmproj.gguf
+parameters:
+  model: bakllava.gguf
+
+template:
+  chat: |
+    A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
+    {{.Input}}
+    ASSISTANT:
+
+download_files:
+- filename: bakllava.gguf
+  uri: huggingface://mys/ggml_bakllava-1/ggml-model-q4_k.gguf
+- filename: bakllava-mmproj.gguf
+  uri: huggingface://mys/ggml_bakllava-1/mmproj-model-f16.gguf
+
+usage: |
+    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+        "model": "gpt-4-vision-preview",
+        "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
--- a/aio/entrypoint.sh
+++ b/aio/entrypoint.sh
@@ -0,0 +1,138 @@
+#!/bin/bash
+
+echo "===> LocalAI All-in-One (AIO) container starting..."
+
+GPU_ACCELERATION=false
+GPU_VENDOR=""
+
+function check_intel() {
+    if lspci | grep -E 'VGA|3D' | grep -iq intel; then
+        echo "Intel GPU detected"
+        if [ -d /opt/intel ]; then
+            GPU_ACCELERATION=true
+            GPU_VENDOR=intel
+        else
+            echo "Intel GPU detected, but Intel GPU drivers are not installed. GPU acceleration will not be available."
+        fi
+    fi
+}
+
+function check_nvidia_wsl() {
+    if lspci | grep -E 'VGA|3D' | grep -iq "Microsoft Corporation Device 008e"; then
+        # We make the assumption this WSL2 cars is NVIDIA, then check for nvidia-smi
+        # Make sure the container was run with `--gpus all` as the only required parameter
+        echo "NVIDIA GPU detected via WSL2"
+        # nvidia-smi should be installed in the container
+        if nvidia-smi; then
+            GPU_ACCELERATION=true
+            GPU_VENDOR=nvidia
+        else
+            echo "NVIDIA GPU detected via WSL2, but nvidia-smi is not installed. GPU acceleration will not be available."
+        fi
+    fi
+}
+
+function check_amd() {
+    if lspci | grep -E 'VGA|3D' | grep -iq amd; then
+        echo "AMD GPU detected"
+        # Check if ROCm is installed
+        if [ -d /opt/rocm ]; then
+            GPU_ACCELERATION=true
+            GPU_VENDOR=amd
+        else
+            echo "AMD GPU detected, but ROCm is not installed. GPU acceleration will not be available."
+        fi
+    fi
+}
+
+function check_nvidia() {
+    if lspci | grep -E 'VGA|3D' | grep -iq nvidia; then
+        echo "NVIDIA GPU detected"
+        # nvidia-smi should be installed in the container
+        if nvidia-smi; then
+            GPU_ACCELERATION=true
+            GPU_VENDOR=nvidia
+        else
+            echo "NVIDIA GPU detected, but nvidia-smi is not installed. GPU acceleration will not be available."
+        fi
+    fi
+}
+
+function check_metal() {
+    if system_profiler SPDisplaysDataType | grep -iq 'Metal'; then
+        echo "Apple Metal supported GPU detected"
+        GPU_ACCELERATION=true
+        GPU_VENDOR=apple
+    fi
+}
+
+function detect_gpu() {
+    case "$(uname -s)" in
+        Linux)
+            check_nvidia
+            check_amd
+            check_intel
+            check_nvidia_wsl
+            ;;
+        Darwin)
+            check_metal
+            ;;
+    esac
+}
+
+function detect_gpu_size() {
+    # Attempting to find GPU memory size for NVIDIA GPUs
+    if [ "$GPU_ACCELERATION" = true ] && [ "$GPU_VENDOR" = "nvidia" ]; then
+        echo "NVIDIA GPU detected. Attempting to find memory size..."
+        # Using head -n 1 to get the total memory of the 1st NVIDIA GPU detected.
+        # If handling multiple GPUs is required in the future, this is the place to do it
+        nvidia_sm=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits | head -n 1)
+        if [ ! -z "$nvidia_sm" ]; then
+            echo "Total GPU Memory: $nvidia_sm MiB"
+            # if bigger than 8GB, use 16GB
+            #if [ "$nvidia_sm" -gt 8192 ]; then
+            #    GPU_SIZE=gpu-16g
+            #else
+            GPU_SIZE=gpu-8g
+            #fi
+        else
+            echo "Unable to determine NVIDIA GPU memory size. Falling back to CPU."
+            GPU_SIZE=gpu-8g
+        fi
+    elif [ "$GPU_ACCELERATION" = true ] && [ "$GPU_VENDOR" = "intel" ]; then
+        GPU_SIZE=intel
+    # Default to a generic GPU size until we implement GPU size detection for non NVIDIA GPUs
+    elif [ "$GPU_ACCELERATION" = true ]; then
+        echo "Non-NVIDIA GPU detected. Specific GPU memory size detection is not implemented."
+        GPU_SIZE=gpu-8g
+
+    # default to cpu if GPU_SIZE is not set
+    else
+        echo "GPU acceleration is not enabled or supported. Defaulting to CPU."
+        GPU_SIZE=cpu
+    fi
+}
+
+function check_vars() {
+    if [ -z "$MODELS" ]; then
+        echo "MODELS environment variable is not set. Please set it to a comma-separated list of model YAML files to load."
+        exit 1
+    fi
+
+    if [ -z "$PROFILE" ]; then
+        echo "PROFILE environment variable is not set. Please set it to one of the following: cpu, gpu-8g, gpu-16g, apple"
+        exit 1
+    fi
+}
+
+detect_gpu
+detect_gpu_size
+
+PROFILE="${PROFILE:-$GPU_SIZE}" # default to cpu
+export MODELS="${MODELS:-/aio/${PROFILE}/embeddings.yaml,/aio/${PROFILE}/text-to-speech.yaml,/aio/${PROFILE}/image-gen.yaml,/aio/${PROFILE}/text-to-text.yaml,/aio/${PROFILE}/speech-to-text.yaml,/aio/${PROFILE}/vision.yaml}"
+
+check_vars
+
+echo "===> Starting LocalAI[$PROFILE] with the following models: $MODELS"
+
+exec /build/entrypoint.sh "$@"
--- a/aio/gpu-8g/embeddings.yaml
+++ b/aio/gpu-8g/embeddings.yaml
@@ -0,0 +1,12 @@
+name: text-embedding-ada-002
+backend: sentencetransformers
+parameters:
+  model: all-MiniLM-L6-v2
+
+usage: |
+    You can test this model with curl like this:
+
+    curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
+      "input": "Your text string goes here",
+      "model": "text-embedding-ada-002"
+    }'
--- a/aio/gpu-8g/image-gen.yaml
+++ b/aio/gpu-8g/image-gen.yaml
@@ -0,0 +1,25 @@
+name: stablediffusion
+parameters:
+  model: DreamShaper_8_pruned.safetensors
+backend: diffusers
+step: 25
+f16: true
+
+diffusers:
+  pipeline_type: StableDiffusionPipeline
+  cuda: true
+  enable_parameters: "negative_prompt,num_inference_steps"
+  scheduler_type: "k_dpmpp_2m"
+
+download_files:
+- filename: DreamShaper_8_pruned.safetensors
+  uri: huggingface://Lykon/DreamShaper/DreamShaper_8_pruned.safetensors
+
+usage: |
+        curl http://localhost:8080/v1/images/generations \
+          -H "Content-Type: application/json" \
+          -d '{
+            "prompt": "<positive prompt>|<negative prompt>",
+            "step": 25,
+            "size": "512x512"
+          }'
--- a/aio/gpu-8g/speech-to-text.yaml
+++ b/aio/gpu-8g/speech-to-text.yaml
@@ -0,0 +1,18 @@
+name: whisper-1
+backend: whisper
+parameters:
+  model: ggml-whisper-base.bin
+
+usage: |
+    ## example audio file
+    wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
+
+    ## Send the example audio file to the transcriptions endpoint
+    curl http://localhost:8080/v1/audio/transcriptions \
+         -H "Content-Type: multipart/form-data" \
+         -F file="@$PWD/gb1.ogg" -F model="whisper-1"
+
+download_files:
+- filename: "ggml-whisper-base.bin"
+  sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
+  uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
--- a/aio/gpu-8g/text-to-speech.yaml
+++ b/aio/gpu-8g/text-to-speech.yaml
@@ -0,0 +1,15 @@
+name: tts-1
+download_files:
+  - filename: voice-en-us-amy-low.tar.gz
+    uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
+
+parameters:
+  model: en-us-amy-low.onnx
+
+usage: |
+    To test if this model works as expected, you can use the following curl command:
+
+    curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
+      "model":"tts-1",
+      "input": "Hi, this is a test."
+    }'
--- a/aio/gpu-8g/text-to-text.yaml
+++ b/aio/gpu-8g/text-to-text.yaml
@@ -0,0 +1,53 @@
+name: gpt-4
+mmap: true
+parameters:
+  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
+
+template:
+  chat_message: |
+    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
+    {{- if .FunctionCall }}<tool_call>{{end}}
+    {{- if eq .RoleName "tool" }}<tool_result>{{end }}
+    {{- if .Content}}
+    {{.Content}}
+    {{- end }}
+    {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
+    {{- if .FunctionCall }}</tool_call>{{end }}
+    {{- if eq .RoleName "tool" }}</tool_result>{{end }}
+    <|im_end|>
+  # https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
+  function: |
+    <|im_start|>system
+    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
+    <tools>
+    {{range .Functions}}
+    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
+    {{end}}
+    </tools>
+    Use the following pydantic model json schema for each tool call you will make:
+    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
+    For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
+    <tool_call>
+    {'arguments': <args-dict>, 'name': <function-name>}
+    </tool_call>
+    <|im_end|>
+    {{.Input -}}
+    <|im_start|>assistant
+    <tool_call>
+  chat: |
+    {{.Input -}}
+    <|im_start|>assistant
+  completion: |
+    {{.Input}}
+context_size: 4096
+f16: true
+stopwords:
+- <|im_end|>
+- <dummy32000>
+- "\n</tool_call>"
+- "\n\n\n"
+usage: |
+      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+          "model": "gpt-4",
+          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
+      }'
--- a/aio/gpu-8g/vision.yaml
+++ b/aio/gpu-8g/vision.yaml
@@ -0,0 +1,35 @@
+backend: llama-cpp
+context_size: 4096
+f16: true
+mmap: true
+name: gpt-4-vision-preview
+
+roles:
+  user: "USER:"
+  assistant: "ASSISTANT:"
+  system: "SYSTEM:"
+
+mmproj: llava-v1.6-7b-mmproj-f16.gguf
+parameters:
+  model: llava-v1.6-mistral-7b.Q5_K_M.gguf
+  temperature: 0.2
+  top_k: 40
+  top_p: 0.95
+  seed: -1
+
+template:
+  chat: |
+    A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
+    {{.Input}}
+    ASSISTANT:
+
+download_files:
+- filename: llava-v1.6-mistral-7b.Q5_K_M.gguf
+  uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf
+- filename: llava-v1.6-7b-mmproj-f16.gguf
+  uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
+
+usage: |
+    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+        "model": "gpt-4-vision-preview",
+        "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
--- a/aio/intel/embeddings.yaml
+++ b/aio/intel/embeddings.yaml
@@ -0,0 +1,12 @@
+name: text-embedding-ada-002
+backend: sentencetransformers
+parameters:
+  model: all-MiniLM-L6-v2
+
+usage: |
+    You can test this model with curl like this:
+
+    curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
+      "input": "Your text string goes here",
+      "model": "text-embedding-ada-002"
+    }'
--- a/aio/intel/image-gen.yaml
+++ b/aio/intel/image-gen.yaml
@@ -0,0 +1,20 @@
+name: stablediffusion
+parameters:
+  model: runwayml/stable-diffusion-v1-5
+backend: diffusers
+step: 25
+f16: true
+diffusers:
+  pipeline_type: StableDiffusionPipeline
+  cuda: true
+  enable_parameters: "negative_prompt,num_inference_steps"
+  scheduler_type: "k_dpmpp_2m"
+
+usage: |
+        curl http://localhost:8080/v1/images/generations \
+          -H "Content-Type: application/json" \
+          -d '{
+            "prompt": "<positive prompt>|<negative prompt>",
+            "step": 25,
+            "size": "512x512"
+          }'
--- a/aio/intel/speech-to-text.yaml
+++ b/aio/intel/speech-to-text.yaml
@@ -0,0 +1,18 @@
+name: whisper-1
+backend: whisper
+parameters:
+  model: ggml-whisper-base.bin
+
+usage: |
+    ## example audio file
+    wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg
+
+    ## Send the example audio file to the transcriptions endpoint
+    curl http://localhost:8080/v1/audio/transcriptions \
+         -H "Content-Type: multipart/form-data" \
+         -F file="@$PWD/gb1.ogg" -F model="whisper-1"
+
+download_files:
+- filename: "ggml-whisper-base.bin"
+  sha256: "60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe"
+  uri: "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
--- a/aio/intel/text-to-speech.yaml
+++ b/aio/intel/text-to-speech.yaml
@@ -0,0 +1,15 @@
+name: tts-1
+download_files:
+  - filename: voice-en-us-amy-low.tar.gz
+    uri: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-en-us-amy-low.tar.gz
+
+parameters:
+  model: en-us-amy-low.onnx
+
+usage: |
+    To test if this model works as expected, you can use the following curl command:
+
+    curl http://localhost:8080/tts -H "Content-Type: application/json" -d '{
+      "model":"tts-1",
+      "input": "Hi, this is a test."
+    }'
--- a/aio/intel/text-to-text.yaml
+++ b/aio/intel/text-to-text.yaml
@@ -0,0 +1,53 @@
+name: gpt-4
+mmap: false
+f16: false
+parameters:
+  model: huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q6_K.gguf
+
+template:
+  chat_message: |
+    <|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
+    {{- if .FunctionCall }}<tool_call>{{end}}
+    {{- if eq .RoleName "tool" }}<tool_result>{{end }}
+    {{- if .Content}}
+    {{.Content}}
+    {{- end }}
+    {{- if .FunctionCall}}{{toJson .FunctionCall}}{{end }}
+    {{- if .FunctionCall }}</tool_call>{{end }}
+    {{- if eq .RoleName "tool" }}</tool_result>{{end }}
+    <|im_end|>
+  # https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF#prompt-format-for-function-calling
+  function: |
+    <|im_start|>system
+    You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
+    <tools>
+    {{range .Functions}}
+    {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
+    {{end}}
+    </tools>
+    Use the following pydantic model json schema for each tool call you will make:
+    {'title': 'FunctionCall', 'type': 'object', 'properties': {'arguments': {'title': 'Arguments', 'type': 'object'}, 'name': {'title': 'Name', 'type': 'string'}}, 'required': ['arguments', 'name']}
+    For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
+    <tool_call>
+    {'arguments': <args-dict>, 'name': <function-name>}
+    </tool_call>
+    <|im_end|>
+    {{.Input -}}
+    <|im_start|>assistant
+    <tool_call>
+  chat: |
+    {{.Input -}}
+    <|im_start|>assistant
+  completion: |
+    {{.Input}}
+context_size: 4096
+stopwords:
+- <|im_end|>
+- "\n</tool_call>"
+- <dummy32000>
+- "\n\n\n"
+usage: |
+      curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+          "model": "gpt-4",
+          "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
+      }'
--- a/aio/intel/vision.yaml
+++ b/aio/intel/vision.yaml
@@ -0,0 +1,35 @@
+backend: llama-cpp
+context_size: 4096
+mmap: false
+f16: false
+name: gpt-4-vision-preview
+
+roles:
+  user: "USER:"
+  assistant: "ASSISTANT:"
+  system: "SYSTEM:"
+
+mmproj: llava-v1.6-7b-mmproj-f16.gguf
+parameters:
+  model: llava-v1.6-mistral-7b.Q5_K_M.gguf
+  temperature: 0.2
+  top_k: 40
+  top_p: 0.95
+  seed: -1
+
+template:
+  chat: |
+    A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
+    {{.Input}}
+    ASSISTANT:
+
+download_files:
+- filename: llava-v1.6-mistral-7b.Q5_K_M.gguf
+  uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/llava-v1.6-mistral-7b.Q5_K_M.gguf
+- filename: llava-v1.6-7b-mmproj-f16.gguf
+  uri: huggingface://cjpais/llava-1.6-mistral-7b-gguf/mmproj-model-f16.gguf
+
+usage: |
+    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
+        "model": "gpt-4-vision-preview",
+        "messages": [{"role": "user", "content": [{"type":"text", "text": "What is in the image?"}, {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" }}], "temperature": 0.9}]}'
--- a/Show More
+++ b/Show More