diff --git a/.agents/adding-backends.md b/.agents/adding-backends.md index 4a37a298e..fb98c55f2 100644 --- a/.agents/adding-backends.md +++ b/.agents/adding-backends.md @@ -102,6 +102,24 @@ Multi-arch backends are NOT a single matrix entry with `platforms: 'linux/amd64, Entries whose `dockerfile` is `./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` must also set a `builder-base-image` field pointing at a prebuilt base from `quay.io/go-skynet/ci-cache:base-grpc-*` (CI builds these via `.github/workflows/base-images.yml`). The mapping is by `(build-type, platforms)` — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~25–35 min cold). Local `make backends/` ignores `builder-base-image` and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds. +### Cover every OS the project supports (Linux **and** Darwin) + +`.github/backend-matrix.yml` has two matrices, and they are the source of truth for which OS a backend ships on: + +- `include:` — the **Linux** matrix (x86_64 + arm64; CPU and CUDA / ROCm / SYCL / Vulkan). +- `includeDarwin:` — the **macOS / Apple Silicon** matrix (arm64; Metal where the engine supports it, otherwise a native arm64 CPU build). + +**A new backend must target every OS it can build for — do not ship Linux-only by default.** A backend that appears only under `include:` is silently unavailable on macOS even when its code would run there. Most C/C++/GGML engines build on Darwin out of the box (ggml defaults `GGML_METAL=ON` on Apple, so a plain build is Metal-enabled), and many Python backends do too (CPU / MPS wheels). If a backend genuinely cannot support an OS (e.g. CUDA-only, no CPU variant), state that in the PR description instead of omitting it silently. + +Wiring a backend into `includeDarwin:` is more than the matrix entry: + +1. **`includeDarwin:` entry** — `tag-suffix: "-metal-darwin-arm64-"`, `build-type: "metal"`, `lang: "go"` for go+ggml backends; omit `build-type` for the bespoke C++ ones (llama-cpp / ds4 / privacy-filter). Match an existing entry of the same shape. +2. **`backend/index.yaml`** — add `metal:` to the backend's `capabilities` map (main and `-development`) and concrete `metal-` / `metal--development` image entries pointing at the `-metal-darwin-arm64-` images. +3. **C/C++ backends only** — add an `inferBackendPathDarwin` case in `scripts/changed-backends.js` returning `backend/cpp//` (the generic fallthrough assumes `backend//`, which is wrong for a C++ source tree driven with `lang: go`), and give `run.sh` a Darwin branch that exports `DYLD_LIBRARY_PATH` instead of `LD_LIBRARY_PATH`. If the build is bespoke (single `grpc-server` + dylib bundling), model it on `scripts/build/ds4-darwin.sh` and add a `backends/-darwin` make target plus a gated step in `.github/workflows/backend_build_darwin.yml`. +4. **C++ proto gotcha** — if the backend compiles the generated gRPC/protobuf in a separate CMake target (e.g. `hw_grpc_proto`), that target must link `protobuf::libprotobuf` + `gRPC::grpc++` so the Homebrew include dirs propagate; otherwise macOS fails with `google/protobuf/runtime_version.h not found` (Linux hides this because apt headers sit in `/usr/include`). + +The CI path filter only builds a backend on a PR when a file under its directory changes, so a darwin-only YAML edit builds nothing — touch a file under `backend///` (a one-line comment is enough) in the same PR. + ## 3. Add Backend Metadata to `backend/index.yaml` **Step 3a: Add Meta Definition** @@ -198,12 +216,34 @@ docker-build-backends: ... docker-build- - If the backend is in `backend/python//` but uses `.` as context in the workflow file, use `.` context - Check similar backends to determine the correct context +## Documenting the backend (README + docs) + +A backend is not "added" until it is discoverable. Update the user-facing docs: + +- **`docs/content/features/backends.md`** - add the backend to the right + category in the "LocalAI supports various types of backends" list (and add a + new category if it introduces a new modality, e.g. sound classification). +- If the backend introduces a **new API surface** (a new endpoint or a realtime + capability), document it under `docs/content/` where its area lives (audio, + vision, etc.) and follow the api-endpoints checklist in + [api-endpoints-and-auth.md](api-endpoints-and-auth.md). + +**If the backend is a native C/C++/GGML engine created and maintained by the +LocalAI team** (a from-scratch port like `parakeet.cpp`, `ced.cpp`, +`vibevoice.cpp`, `rf-detr.cpp`, not a wrapper around a third-party runtime), it +ALSO belongs in the top-level **`README.md`** table under "native C/C++/GGML +engines ... developed and maintained by the LocalAI project itself". Add a row +linking the upstream engine repo with a one-line description. This is the +project's showcase of its own engines; a new in-house backend that is missing +from it is a documentation bug. + ## 5. Verification Checklist After adding a new backend, verify: - [ ] Backend directory structure is complete with all necessary files - [ ] Build configurations added to `.github/backend-matrix.yml` for all desired platforms (per-arch entries with `platform-tag` for multi-arch; `builder-base-image` for llama-cpp / ik-llama-cpp / turboquant) +- [ ] **OS coverage considered**: added to `includeDarwin:` (macOS/Apple Silicon) if the backend can build there — with the `backend/index.yaml` `metal:` capability + `metal-` image entries, a `run.sh` Darwin/DYLD branch and `inferBackendPathDarwin` case for C++ backends — or the PR explains why an OS is unsupported. Do not ship Linux-only by default. - [ ] Meta definition added to `backend/index.yaml` in the `## metas` section - [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development) - [ ] Tag suffixes match between workflow file and index.yaml @@ -211,6 +251,8 @@ After adding a new backend, verify: - [ ] No YAML syntax errors (check with linter) - [ ] No Makefile syntax errors (check with linter) - [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern) +- [ ] Documented: added to the category list in `docs/content/features/backends.md` (and any new endpoint/realtime capability documented under `docs/content/`) +- [ ] If it is an in-house native C/C++/GGML engine, added to the maintained-engines table in the top-level `README.md` ## Bundling runtime shared libraries (`package.sh`) diff --git a/.docker/install-base-deps.sh b/.docker/install-base-deps.sh index 5b0908fa8..2b0e7e0c6 100755 --- a/.docker/install-base-deps.sh +++ b/.docker/install-base-deps.sh @@ -70,6 +70,12 @@ if [ "${BUILD_TYPE:-}" = "vulkan" ] && [ "${SKIP_DRIVERS:-false}" = "false" ]; t git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \ ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \ clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils + # Mesa Vulkan ICD drivers (ANV/RADV/lavapipe + Arm SoC) and their ICD + # manifests. The LunarG SDK below only provides the loader and shader + # tooling, not hardware drivers — without Mesa the packaged Vulkan backend + # would ship a loader that finds no GPU. package-gpu-libs.sh bundles these + # .so files plus their deps into the backend so it stays self-contained. + apt-get install -y mesa-vulkan-drivers libdrm2 if [ "amd64" = "${TARGETARCH:-}" ]; then wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz diff --git a/.docker/llama-cpp-compile.sh b/.docker/llama-cpp-compile.sh index bbc9aa21f..647a1c448 100755 --- a/.docker/llama-cpp-compile.sh +++ b/.docker/llama-cpp-compile.sh @@ -17,19 +17,29 @@ if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then rm -rf /LocalAI/backend/cpp/llama-cpp-*-build fi -if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then - cd /LocalAI/backend/cpp/llama-cpp - make llama-cpp-fallback - make llama-cpp-grpc - make llama-cpp-rpc-server +cd /LocalAI/backend/cpp/llama-cpp +if [ -z "${BUILD_TYPE:-}" ]; then + # Pure CPU image (BUILD_TYPE empty): one build with ggml CPU_ALL_VARIANTS replaces the + # per-microarch binaries (x86: avx/avx2/avx512/fallback; arm64: armv8.x/armv9.x). ggml + # dlopens the best libggml-cpu-*.so at runtime by probing host CPU features. + # + # arm64: the CPU_ALL_VARIANTS table includes armv9.2 SME variants whose -march=...+sme is + # rejected by the Ubuntu 24.04 default gcc-13. gcc-14 accepts it, so build the arm64 + # variants with it (the host never *selects* SME unless it has it, but every variant must + # still compile). + if [ "${TARGETARCH}" = "arm64" ]; then + apt-get update -qq && apt-get install -y -qq gcc-14 g++-14 + export CC=gcc-14 CXX=g++-14 + fi + make llama-cpp-cpu-all else - cd /LocalAI/backend/cpp/llama-cpp - make llama-cpp-avx - make llama-cpp-avx2 - make llama-cpp-avx512 + # GPU build (cublas/hipblas/sycl/vulkan/...): the accelerator does the compute, so a + # single fallback CPU build is enough - no per-microarch CPU variants needed. (This also + # keeps the heavy GPU backend compile from also building the whole CPU variant matrix, + # and avoids the gcc-14 apt step on GPU base images such as nvidia l4t.) make llama-cpp-fallback - make llama-cpp-grpc - make llama-cpp-rpc-server fi +make llama-cpp-grpc +make llama-cpp-rpc-server ccache -s || true diff --git a/.docker/turboquant-compile.sh b/.docker/turboquant-compile.sh index 7468bc1a7..ca6cf2690 100755 --- a/.docker/turboquant-compile.sh +++ b/.docker/turboquant-compile.sh @@ -19,17 +19,21 @@ fi cd /LocalAI/backend/cpp/turboquant -if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then - make turboquant-fallback - make turboquant-grpc - make turboquant-rpc-server +if [ -z "${BUILD_TYPE:-}" ]; then + # Pure CPU image: one ggml CPU_ALL_VARIANTS build replaces the per-microarch binaries. + # arm64: the armv9.2 SME variants need gcc-14 (gcc-13 rejects +sme). + if [ "${TARGETARCH}" = "arm64" ]; then + apt-get update -qq && apt-get install -y -qq gcc-14 g++-14 + export CC=gcc-14 CXX=g++-14 + fi + make turboquant-cpu-all else - make turboquant-avx - make turboquant-avx2 - make turboquant-avx512 + # GPU build (cublas/hipblas/sycl/vulkan/...): single fallback CPU build, the accelerator + # does the compute. Keeps the GPU compile from also building the CPU variant matrix and + # avoids the gcc-14 apt step on GPU base images such as nvidia l4t. make turboquant-fallback - make turboquant-grpc - make turboquant-rpc-server fi +make turboquant-grpc +make turboquant-rpc-server ccache -s || true diff --git a/.github/backend-matrix.yml b/.github/backend-matrix.yml index c2c6638ec..b66f1bbf3 100644 --- a/.github/backend-matrix.yml +++ b/.github/backend-matrix.yml @@ -2,6 +2,28 @@ # Matrix data for backend container image builds. # Consumed by scripts/changed-backends.js for both backend.yml and backend_pr.yml. # This file is NOT a workflow — it has no top-level 'on:' or 'jobs:'. +# +# OS / platform coverage — READ THIS WHEN ADDING A BACKEND +# -------------------------------------------------------- +# This file is the source of truth for which OS each backend is built and +# published for. A backend ships ONLY for the matrices it appears in: +# - Linux -> the `include:` matrix below (x86_64 + arm64; CPU and +# CUDA / ROCm / SYCL / Vulkan variants). +# - macOS -> the `includeDarwin:` matrix (Apple Silicon / arm64; Metal where +# the engine supports it, otherwise a native arm64 CPU build). +# +# New backends must target EVERY OS they can build for, not just Linux. A backend +# listed only under `include:` is silently unavailable on macOS even when its code +# would run there. Most C/C++/GGML engines build on Darwin (ggml defaults +# GGML_METAL=ON on Apple, so a plain build is Metal-enabled), and many Python +# backends do too (CPU / MPS). If a backend genuinely cannot support an OS, say so +# in its PR description rather than silently omitting it. +# +# Adding a backend to `includeDarwin:` is more than one line — see the darwin +# checklist in .agents/adding-backends.md (includeDarwin entry, the index.yaml +# `metal:` capability + `metal-` image entries, a `run.sh` Darwin/DYLD +# branch for C/C++ backends, and the inferBackendPathDarwin case in +# scripts/changed-backends.js so the path filter actually builds it). # Linux matrix (consumed by backend-jobs). include: @@ -3575,6 +3597,154 @@ include: dockerfile: "./backend/Dockerfile.golang" context: "./" ubuntu-version: '2404' + # ced + - build-type: 'cublas' + cuda-major-version: "12" + cuda-minor-version: "8" + platforms: 'linux/amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-nvidia-cuda-12-ced' + runs-on: 'ubuntu-latest' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'cublas' + cuda-major-version: "13" + cuda-minor-version: "0" + platforms: 'linux/amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-nvidia-cuda-13-ced' + runs-on: 'ubuntu-latest' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'cublas' + cuda-major-version: "13" + cuda-minor-version: "0" + platforms: 'linux/arm64' + skip-drivers: 'false' + tag-latest: 'auto' + tag-suffix: '-nvidia-l4t-cuda-13-arm64-ced' + base-image: "ubuntu:24.04" + ubuntu-version: '2404' + runs-on: 'ubuntu-24.04-arm' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + - build-type: '' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/amd64' + platform-tag: 'amd64' + tag-latest: 'auto' + tag-suffix: '-cpu-ced' + runs-on: 'ubuntu-latest' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: '' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/arm64' + platform-tag: 'arm64' + tag-latest: 'auto' + tag-suffix: '-cpu-ced' + runs-on: 'ubuntu-24.04-arm' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'sycl_f32' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-intel-sycl-f32-ced' + runs-on: 'ubuntu-latest' + base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'sycl_f16' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-intel-sycl-f16-ced' + runs-on: 'ubuntu-latest' + base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'vulkan' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/amd64' + platform-tag: 'amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-vulkan-ced' + runs-on: 'ubuntu-latest' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'vulkan' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/arm64' + platform-tag: 'arm64' + tag-latest: 'auto' + tag-suffix: '-gpu-vulkan-ced' + runs-on: 'ubuntu-24.04-arm' + base-image: "ubuntu:24.04" + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' + - build-type: 'cublas' + cuda-major-version: "12" + cuda-minor-version: "0" + platforms: 'linux/arm64' + skip-drivers: 'false' + tag-latest: 'auto' + tag-suffix: '-nvidia-l4t-arm64-ced' + base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0" + runs-on: 'ubuntu-24.04-arm' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2204' + - build-type: 'hipblas' + cuda-major-version: "" + cuda-minor-version: "" + platforms: 'linux/amd64' + tag-latest: 'auto' + tag-suffix: '-gpu-rocm-hipblas-ced' + base-image: "rocm/dev-ubuntu-24.04:7.2.1" + runs-on: 'ubuntu-latest' + skip-drivers: 'false' + backend: "ced" + dockerfile: "./backend/Dockerfile.golang" + context: "./" + ubuntu-version: '2404' # acestep-cpp - build-type: '' cuda-major-version: "" @@ -4754,6 +4924,10 @@ includeDarwin: tag-suffix: "-metal-darwin-arm64-parakeet-cpp" build-type: "metal" lang: "go" + - backend: "ced" + tag-suffix: "-metal-darwin-arm64-ced" + build-type: "metal" + lang: "go" - backend: "acestep-cpp" tag-suffix: "-metal-darwin-arm64-acestep-cpp" build-type: "metal" @@ -4770,6 +4944,31 @@ includeDarwin: tag-suffix: "-metal-darwin-arm64-vibevoice-cpp" build-type: "metal" lang: "go" + # Vision/utility C++/ggml backends (go+cgo). Their Makefiles already carry a + # Darwin/Metal path (GGML_METAL=ON when build-type=metal); this just builds and + # publishes the metal image so Apple Silicon can install them. + - backend: "depth-anything-cpp" + tag-suffix: "-metal-darwin-arm64-depth-anything-cpp" + build-type: "metal" + lang: "go" + - backend: "locate-anything-cpp" + tag-suffix: "-metal-darwin-arm64-locate-anything-cpp" + build-type: "metal" + lang: "go" + - backend: "rfdetr-cpp" + tag-suffix: "-metal-darwin-arm64-rfdetr-cpp" + build-type: "metal" + lang: "go" + - backend: "sam3-cpp" + tag-suffix: "-metal-darwin-arm64-sam3-cpp" + build-type: "metal" + lang: "go" + # LocalVQE has no Metal path; on Apple Silicon it builds CPU-only (GGML_METAL + # OFF) but is still a native arm64 image. Uses the darwin/metal build profile. + - backend: "localvqe" + tag-suffix: "-metal-darwin-arm64-localvqe" + build-type: "metal" + lang: "go" - backend: "voxtral" tag-suffix: "-metal-darwin-arm64-voxtral" build-type: "metal" @@ -4822,6 +5021,19 @@ includeDarwin: - backend: "kitten-tts" tag-suffix: "-metal-darwin-arm64-kitten-tts" build-type: "mps" + # vLLM on Apple Silicon via vllm-metal (MLX). The install is custom + # (backend/python/vllm/install.sh has a darwin branch); lang stays python so + # backend_build_darwin.yml drives it through build-darwin-python-backend -> + # scripts/build/python-darwin.sh, which runs the backend's install.sh. + - backend: "vllm" + tag-suffix: "-metal-darwin-arm64-vllm" + build-type: "mps" + - backend: "trl" + tag-suffix: "-metal-darwin-arm64-trl" + build-type: "mps" + - backend: "liquid-audio" + tag-suffix: "-metal-darwin-arm64-liquid-audio" + build-type: "mps" - backend: "piper" tag-suffix: "-metal-darwin-arm64-piper" build-type: "metal" @@ -4838,6 +5050,10 @@ includeDarwin: tag-suffix: "-metal-darwin-arm64-sherpa-onnx" build-type: "metal" lang: "go" + - backend: "supertonic" + tag-suffix: "-metal-darwin-arm64-supertonic" + build-type: "metal" + lang: "go" - backend: "local-store" tag-suffix: "-metal-darwin-arm64-local-store" build-type: "metal" diff --git a/.github/bump_vllm_metal.sh b/.github/bump_vllm_metal.sh new file mode 100755 index 000000000..f842680d5 --- /dev/null +++ b/.github/bump_vllm_metal.sh @@ -0,0 +1,55 @@ +#!/bin/bash +# Bump the single vllm-metal pin (VLLM_METAL_VERSION) in the vLLM backend's +# darwin (Apple Silicon) install path. The macOS/Metal build +# (backend/python/vllm/install.sh, Darwin branch) installs vllm-metal, which is +# version-locked to a specific vLLM source release. install.sh derives that vLLM +# version at build time from vllm-metal's own installer (`vllm_v=`) at the pinned +# tag, so there is only ONE value to bump here -- mirroring bump_vllm_wheel.sh, +# which bumps the Linux cu130 wheel pin. +# +# This deliberately tracks vllm-project/vllm-metal, NOT vllm-project/vllm: the +# darwin build can only use the exact vLLM version vllm-metal supports, so it may +# lag the Linux pin (requirements-cublas13-after.txt) until vllm-metal catches up. +set -xe +REPO=$1 # vllm-project/vllm-metal +FILE=$2 # backend/python/vllm/install.sh +VAR=$3 # VLLM_METAL_VERSION (used for the workflow's output file names) + +if [ -z "$FILE" ] || [ -z "$REPO" ] || [ -z "$VAR" ]; then + echo "usage: $0 " >&2 + exit 1 +fi + +# vllm-metal ships frequent dev releases, all flagged as non-prerelease, so +# /releases/latest returns the newest one (with its cp312 wheel asset). +LATEST_TAG=$(curl -sS -H "Accept: application/vnd.github+json" \ + "https://api.github.com/repos/$REPO/releases/latest" \ + | python3 -c "import json,sys; print(json.load(sys.stdin)['tag_name'])") + +# The coupled vLLM source version lives in vllm-metal's installer at that tag. +NEW_VLLM_VERSION=$(curl -fsSL \ + "https://raw.githubusercontent.com/$REPO/$LATEST_TAG/install.sh" \ + | grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -1 | cut -d'"' -f2) + +if [ -z "$LATEST_TAG" ] || [ -z "$NEW_VLLM_VERSION" ]; then + echo "Could not resolve vllm-metal tag ($LATEST_TAG) or its vllm_v ($NEW_VLLM_VERSION)." >&2 + exit 1 +fi + +set +e +CURRENT_TAG=$(grep -oE 'VLLM_METAL_VERSION="[^"]*"' "$FILE" | head -1 | cut -d'"' -f2) +set -e + +# Rewrite the single pin. install.sh derives VLLM_VERSION from this tag at build +# time, so there is nothing else to touch. peter-evans/create-pull-request opens +# no PR on a clean tree, so a no-op rewrite (already current) is safe. +sed -i "$FILE" \ + -e "s|VLLM_METAL_VERSION=\"[^\"]*\"|VLLM_METAL_VERSION=\"$LATEST_TAG\"|" + +if [ -z "$CURRENT_TAG" ]; then + echo "Could not find VLLM_METAL_VERSION=\"...\" in $FILE." >&2 + exit 0 +fi + +echo "vllm-metal ${CURRENT_TAG} -> ${LATEST_TAG} (builds vLLM ${NEW_VLLM_VERSION}): https://github.com/$REPO/releases/tag/${LATEST_TAG}" >> "${VAR}_message.txt" +echo "${LATEST_TAG}" >> "${VAR}_commit.txt" diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml index b41c3d4dd..705768b59 100644 --- a/.github/workflows/backend.yml +++ b/.github/workflows/backend.yml @@ -44,7 +44,7 @@ jobs: has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }} steps: - name: Checkout repository - uses: actions/checkout@v6 + uses: actions/checkout@v7 - name: Setup Bun uses: oven-sh/setup-bun@v2 diff --git a/.github/workflows/backend_build.yml b/.github/workflows/backend_build.yml index b3e177bd1..05d50cf82 100644 --- a/.github/workflows/backend_build.yml +++ b/.github/workflows/backend_build.yml @@ -101,7 +101,7 @@ jobs: steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true diff --git a/.github/workflows/backend_build_darwin.yml b/.github/workflows/backend_build_darwin.yml index 61f87eff6..749ffd4de 100644 --- a/.github/workflows/backend_build_darwin.yml +++ b/.github/workflows/backend_build_darwin.yml @@ -57,7 +57,7 @@ jobs: HOMEBREW_NO_ANALYTICS: '1' steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true diff --git a/.github/workflows/backend_merge.yml b/.github/workflows/backend_merge.yml index c05fece8d..37f606aa9 100644 --- a/.github/workflows/backend_merge.yml +++ b/.github/workflows/backend_merge.yml @@ -49,7 +49,7 @@ jobs: # Sparse checkout: the merge job needs `.github/scripts/` (for the # keepalive cleanup script) but none of the source tree. - name: Checkout (.github/scripts only) - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: sparse-checkout: | .github/scripts diff --git a/.github/workflows/backend_pr.yml b/.github/workflows/backend_pr.yml index e9520a548..9517651e4 100644 --- a/.github/workflows/backend_pr.yml +++ b/.github/workflows/backend_pr.yml @@ -23,7 +23,7 @@ jobs: has-merges-singlearch: ${{ steps.set-matrix.outputs['has-merges-singlearch'] }} steps: - name: Checkout repository - uses: actions/checkout@v6 + uses: actions/checkout@v7 - name: Setup Bun uses: oven-sh/setup-bun@v2 diff --git a/.github/workflows/base-images.yml b/.github/workflows/base-images.yml index 6152e1c56..637b603a5 100644 --- a/.github/workflows/base-images.yml +++ b/.github/workflows/base-images.yml @@ -127,7 +127,7 @@ jobs: # the original l4t matrix entry which set skip-drivers: 'true'. skip-drivers: 'true' steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 with: submodules: false - name: Free disk space diff --git a/.github/workflows/build-test.yaml b/.github/workflows/build-test.yaml index e634848eb..5b23ddf77 100644 --- a/.github/workflows/build-test.yaml +++ b/.github/workflows/build-test.yaml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Set up Go @@ -25,7 +25,7 @@ jobs: runs-on: macos-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Set up Go @@ -47,7 +47,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Configure apt mirror on runner diff --git a/.github/workflows/bump-inference-defaults.yml b/.github/workflows/bump-inference-defaults.yml index 50485b5f1..0b3afa454 100644 --- a/.github/workflows/bump-inference-defaults.yml +++ b/.github/workflows/bump-inference-defaults.yml @@ -14,7 +14,7 @@ jobs: bump: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - uses: actions/setup-go@v5 with: diff --git a/.github/workflows/bump_deps.yaml b/.github/workflows/bump_deps.yaml index 6dbf8dcf2..a2c37881f 100644 --- a/.github/workflows/bump_deps.yaml +++ b/.github/workflows/bump_deps.yaml @@ -42,6 +42,10 @@ jobs: variable: "PARAKEET_VERSION" branch: "master" file: "backend/go/parakeet-cpp/Makefile" + - repository: "mudler/ced.cpp" + variable: "CED_VERSION" + branch: "master" + file: "backend/go/ced/Makefile" - repository: "mudler/depth-anything.cpp" variable: "DEPTHANYTHING_VERSION" branch: "master" @@ -88,7 +92,7 @@ jobs: file: "backend/go/vibevoice-cpp/Makefile" runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - name: Bump dependencies 🔧 id: bump run: | @@ -124,7 +128,7 @@ jobs: if: github.repository == 'mudler/LocalAI' runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - name: Bump vLLM cu130 wheel pin 🔧 id: bump run: | @@ -150,3 +154,39 @@ jobs: branch: "update/VLLM_VERSION" body: ${{ steps.bump.outputs.message }} signoff: true + + bump-vllm-metal: + # The darwin (Apple Silicon) vLLM build installs vllm-metal, which is locked + # to a specific vLLM source release. install.sh pins both VLLM_METAL_VERSION + # (the wheel release) and VLLM_VERSION (the vLLM it builds against); this job + # tracks vllm-project/vllm-metal and rewrites both atomically. Separate from + # bump-vllm-wheel because darwin follows vllm-metal, not vllm/vllm latest. + if: github.repository == 'mudler/LocalAI' + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v7 + - name: Bump vllm-metal pin 🔧 + id: bump + run: | + bash .github/bump_vllm_metal.sh vllm-project/vllm-metal backend/python/vllm/install.sh VLLM_METAL_VERSION + { + echo 'message<> "$GITHUB_OUTPUT" + { + echo 'commit<> "$GITHUB_OUTPUT" + rm -rfv VLLM_METAL_VERSION_message.txt VLLM_METAL_VERSION_commit.txt + - name: Create Pull Request + uses: peter-evans/create-pull-request@v8 + with: + token: ${{ secrets.UPDATE_BOT_TOKEN }} + push-to-fork: ci-forks/LocalAI + commit-message: ':arrow_up: Update vllm-project/vllm-metal (darwin)' + title: 'chore: :arrow_up: Update vllm-metal (darwin) to `${{ steps.bump.outputs.commit }}`' + branch: "update/VLLM_METAL_VERSION" + body: ${{ steps.bump.outputs.message }} + signoff: true diff --git a/.github/workflows/bump_docs.yaml b/.github/workflows/bump_docs.yaml index 1fe355580..444f7fed7 100644 --- a/.github/workflows/bump_docs.yaml +++ b/.github/workflows/bump_docs.yaml @@ -13,7 +13,7 @@ jobs: - repository: "mudler/LocalAI" runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - name: Bump dependencies 🔧 run: | bash .github/bump_docs.sh ${{ matrix.repository }} diff --git a/.github/workflows/checksum_checker.yaml b/.github/workflows/checksum_checker.yaml index 4952f69c5..3652c65bb 100644 --- a/.github/workflows/checksum_checker.yaml +++ b/.github/workflows/checksum_checker.yaml @@ -8,7 +8,7 @@ jobs: if: github.repository == 'mudler/LocalAI' runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - name: Configure apt mirror on runner uses: ./.github/actions/configure-apt-mirror - name: Install dependencies diff --git a/.github/workflows/deploy-explorer.yaml b/.github/workflows/deploy-explorer.yaml index 5c2a0e354..7914d2054 100644 --- a/.github/workflows/deploy-explorer.yaml +++ b/.github/workflows/deploy-explorer.yaml @@ -16,7 +16,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - uses: actions/setup-go@v5 diff --git a/.github/workflows/gallery-agent.yaml b/.github/workflows/gallery-agent.yaml index ceb87b2d8..dfe9d40fa 100644 --- a/.github/workflows/gallery-agent.yaml +++ b/.github/workflows/gallery-agent.yaml @@ -31,7 +31,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout repository - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/generate_intel_image.yaml b/.github/workflows/generate_intel_image.yaml index 27f1a0a3e..22627387f 100644 --- a/.github/workflows/generate_intel_image.yaml +++ b/.github/workflows/generate_intel_image.yaml @@ -44,7 +44,7 @@ jobs: uses: docker/setup-buildx-action@master - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 - name: Cache Intel images uses: docker/build-push-action@v7 diff --git a/.github/workflows/gh-pages.yml b/.github/workflows/gh-pages.yml index b21627ae1..9df4e4f6f 100644 --- a/.github/workflows/gh-pages.yml +++ b/.github/workflows/gh-pages.yml @@ -28,7 +28,7 @@ jobs: HUGO_VERSION: "0.146.3" steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 # needed for enableGitInfo submodules: true diff --git a/.github/workflows/image_build.yml b/.github/workflows/image_build.yml index b953ddbb2..89bc4124f 100644 --- a/.github/workflows/image_build.yml +++ b/.github/workflows/image_build.yml @@ -80,7 +80,7 @@ jobs: steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 - name: Configure apt mirror on runner id: apt_mirror diff --git a/.github/workflows/image_merge.yml b/.github/workflows/image_merge.yml index 47b3f48a8..18d64d407 100644 --- a/.github/workflows/image_merge.yml +++ b/.github/workflows/image_merge.yml @@ -36,7 +36,7 @@ jobs: # Sparse checkout: needed for .github/scripts/ (the keepalive cleanup # script). Skips the rest of the source tree. - name: Checkout (.github/scripts only) - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: sparse-checkout: | .github/scripts diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml index f9913229a..54d0a740c 100644 --- a/.github/workflows/lint.yml +++ b/.github/workflows/lint.yml @@ -20,7 +20,7 @@ jobs: golangci-lint: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 with: # Full history so golangci-lint's new-from-merge-base can reach # origin/master and compute the diff against it. diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml index a94c64fd4..614c1de3e 100644 --- a/.github/workflows/release.yaml +++ b/.github/workflows/release.yaml @@ -10,7 +10,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Set up Go @@ -28,7 +28,7 @@ jobs: runs-on: macos-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Set up Go @@ -46,7 +46,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Checkout - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: fetch-depth: 0 - name: Configure apt mirror on runner diff --git a/.github/workflows/secscan.yaml b/.github/workflows/secscan.yaml index a8bac30dd..b5bf8e2be 100644 --- a/.github/workflows/secscan.yaml +++ b/.github/workflows/secscan.yaml @@ -14,7 +14,7 @@ jobs: GO111MODULE: on steps: - name: Checkout Source - uses: actions/checkout@v6 + uses: actions/checkout@v7 if: ${{ github.actor != 'dependabot[bot]' }} - name: Run Gosec Security Scanner if: ${{ github.actor != 'dependabot[bot]' }} diff --git a/.github/workflows/test-extra.yml b/.github/workflows/test-extra.yml index c02dcec44..650f464a2 100644 --- a/.github/workflows/test-extra.yml +++ b/.github/workflows/test-extra.yml @@ -50,7 +50,7 @@ jobs: parakeet-cpp: ${{ steps.detect.outputs.parakeet-cpp }} steps: - name: Checkout repository - uses: actions/checkout@v6 + uses: actions/checkout@v7 - name: Setup Bun uses: oven-sh/setup-bun@v2 - name: Install dependencies @@ -67,7 +67,7 @@ jobs: # runs-on: ubuntu-latest # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -90,7 +90,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -113,7 +113,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -137,7 +137,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -158,7 +158,7 @@ jobs: # runs-on: ubuntu-latest # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -178,7 +178,7 @@ jobs: # runs-on: ubuntu-latest # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -240,7 +240,7 @@ jobs: # sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true # df -h # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -265,7 +265,7 @@ jobs: # runs-on: ubuntu-latest # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -288,7 +288,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -309,7 +309,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -330,7 +330,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -351,7 +351,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -373,7 +373,7 @@ jobs: # timeout-minutes: 45 # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -394,7 +394,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -415,7 +415,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -436,7 +436,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -462,7 +462,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -484,7 +484,7 @@ jobs: timeout-minutes: 30 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -513,7 +513,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -530,7 +530,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -552,7 +552,7 @@ jobs: timeout-minutes: 20 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -579,7 +579,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -604,7 +604,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -625,7 +625,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -645,7 +645,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -664,7 +664,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -681,7 +681,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -698,7 +698,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -741,7 +741,7 @@ jobs: # timeout-minutes: 90 # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -783,7 +783,7 @@ jobs: # timeout-minutes: 90 # steps: # - name: Clone - # uses: actions/checkout@v6 + # uses: actions/checkout@v7 # with: # submodules: true # - name: Dependencies @@ -808,7 +808,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -840,7 +840,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -876,7 +876,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -915,7 +915,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -952,7 +952,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -987,7 +987,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -1013,7 +1013,7 @@ jobs: timeout-minutes: 150 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -1042,7 +1042,7 @@ jobs: timeout-minutes: 60 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go @@ -1058,7 +1058,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -1091,7 +1091,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -1114,7 +1114,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies @@ -1140,7 +1140,7 @@ jobs: timeout-minutes: 90 steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index e727261c1..df5512283 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -21,7 +21,7 @@ jobs: go-version: ['1.26.x'] steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Free disk space @@ -84,7 +84,7 @@ jobs: go-version: ['1.26.x'] steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Setup Go ${{ matrix.go-version }} diff --git a/.github/workflows/tests-aio.yml b/.github/workflows/tests-aio.yml index 162389df5..f8d3d34f0 100644 --- a/.github/workflows/tests-aio.yml +++ b/.github/workflows/tests-aio.yml @@ -62,7 +62,7 @@ jobs: sudo rm -rfv build || true df -h - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Dependencies diff --git a/.github/workflows/tests-e2e.yml b/.github/workflows/tests-e2e.yml index 90d7392d9..fb76a51e3 100644 --- a/.github/workflows/tests-e2e.yml +++ b/.github/workflows/tests-e2e.yml @@ -21,7 +21,7 @@ jobs: go-version: ['1.25.x'] steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Configure apt mirror on runner diff --git a/.github/workflows/tests-pii-ner-e2e.yml b/.github/workflows/tests-pii-ner-e2e.yml new file mode 100644 index 000000000..800f67190 --- /dev/null +++ b/.github/workflows/tests-pii-ner-e2e.yml @@ -0,0 +1,97 @@ +--- +name: 'PII NER tier E2E (live GGUF, CPU)' + +# Runs the real privacy-filter GGUF NER tier end-to-end on CPU — the gap the +# hermetic tests/e2e suite cannot cover (it only exercises the in-process +# pattern tier). Heavy (builds the C++ backend image + downloads a ~2.7 GB +# GGUF), so it is path-filtered on PRs and otherwise runs nightly / on demand. +# +# This drives the container-level harness (tests/e2e-backends) via +# `make test-extra-backend-privacy-filter`: it builds the privacy-filter image, +# downloads the model, loads it on CPU, and asserts byte-correct, UTF-8-aligned +# TokenClassify spans. The complementary HTTP-path specs in tests/e2e +# (e2e_pii_ner_test.go) Skip unless PII_NER_MODEL_GGUF is wired. + +on: + workflow_dispatch: + schedule: + - cron: '0 3 * * *' + push: + branches: + - master + paths: + - 'backend/cpp/privacy-filter/**' + - 'backend/Dockerfile.privacy-filter' + - 'core/services/routing/pii/**' + - 'core/services/routing/piidetector/**' + - 'core/backend/token_classify.go' + - 'core/http/endpoints/localai/pii.go' + - 'core/schema/pii.go' + - 'tests/e2e-backends/**' + - 'tests/e2e/e2e_pii_ner_test.go' + - 'tests/e2e/e2e_suite_test.go' + - '.github/workflows/tests-pii-ner-e2e.yml' + pull_request: + paths: + - 'backend/cpp/privacy-filter/**' + - 'backend/Dockerfile.privacy-filter' + - 'core/services/routing/pii/**' + - 'core/services/routing/piidetector/**' + - 'core/backend/token_classify.go' + - 'core/http/endpoints/localai/pii.go' + - 'core/schema/pii.go' + - 'tests/e2e-backends/**' + - 'tests/e2e/e2e_pii_ner_test.go' + - 'tests/e2e/e2e_suite_test.go' + - '.github/workflows/tests-pii-ner-e2e.yml' + +concurrency: + group: ci-tests-pii-ner-e2e-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }} + cancel-in-progress: ${{ github.event_name == 'pull_request' }} + +jobs: + tests-pii-ner-e2e: + runs-on: ubuntu-latest + strategy: + matrix: + go-version: ['1.25.x'] + steps: + - name: Clone + uses: actions/checkout@v7 + with: + submodules: true + - name: Free disk space + run: | + sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL || true + sudo docker image prune --all --force || true + df -h + - name: Configure apt mirror on runner + uses: ./.github/actions/configure-apt-mirror + - name: Setup Go ${{ matrix.go-version }} + uses: actions/setup-go@v5 + with: + go-version: ${{ matrix.go-version }} + cache: false + - name: Proto Dependencies + run: | + curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v26.1/protoc-26.1-linux-x86_64.zip -o protoc.zip && \ + unzip -j -d /usr/local/bin protoc.zip bin/protoc && \ + rm protoc.zip + go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.34.2 + go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@1958fcbe2ca8bd93af633f11e97d44e567e945af + PATH="$PATH:$HOME/go/bin" make protogen-go + - name: Dependencies + run: | + sudo apt-get update + sudo apt-get install -y build-essential + # Builds local-ai-backend:privacy-filter, downloads the GGUF, loads it on + # CPU and runs the token_classify capability spec (byte-offset contract). + - name: Run live PII NER backend E2E + run: PATH="$PATH:$HOME/go/bin" make test-extra-backend-privacy-filter + - name: Setup tmate session if tests fail + if: ${{ failure() }} + uses: mxschmitt/action-tmate@v3.23 + with: + detached: true + connect-timeout-seconds: 180 + limit-access-to-actor: true diff --git a/.github/workflows/tests-ui-e2e.yml b/.github/workflows/tests-ui-e2e.yml index 99bb61e57..3c72f9dc0 100644 --- a/.github/workflows/tests-ui-e2e.yml +++ b/.github/workflows/tests-ui-e2e.yml @@ -23,7 +23,7 @@ jobs: go-version: ['1.26.x'] steps: - name: Clone - uses: actions/checkout@v6 + uses: actions/checkout@v7 with: submodules: true - name: Configure apt mirror on runner diff --git a/.github/workflows/update_swagger.yaml b/.github/workflows/update_swagger.yaml index 4b8590f05..649722dbb 100644 --- a/.github/workflows/update_swagger.yaml +++ b/.github/workflows/update_swagger.yaml @@ -10,7 +10,7 @@ jobs: fail-fast: false runs-on: ubuntu-latest steps: - - uses: actions/checkout@v6 + - uses: actions/checkout@v7 - name: Configure apt mirror on runner uses: ./.github/actions/configure-apt-mirror - uses: actions/setup-go@v5 diff --git a/.gitignore b/.gitignore index cc7d25fa6..177c79cba 100644 --- a/.gitignore +++ b/.gitignore @@ -91,3 +91,6 @@ core/http/react-ui/test-results/ # Local worktrees .worktrees/ + +# SDD / brainstorm scratch (agent-driven development) +.superpowers/ diff --git a/AGENTS.md b/AGENTS.md index 9f397e613..1095ef531 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -43,4 +43,5 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants] - **New API endpoints**: LocalAI advertises its capability surface in several independent places — swagger `@Tags`, `/api/instructions` registry, auth `RouteFeatureRegistry`, React UI `capabilities.js`, docs. Read [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists. - **Admin endpoints → MCP tool**: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in `pkg/mcp/localaitools/`. The LocalAI Assistant chat modality and the standalone `local-ai mcp-server` consume that package; drift between REST and MCP is a real risk. Read [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) — the `TestToolHTTPRouteMappingComplete` test fails until you wire the new tool and update the route map. - **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds +- **Backend OS coverage**: a new backend must target every OS it can build for, not just Linux. `.github/backend-matrix.yml` has two matrices — `include:` (Linux) and `includeDarwin:` (macOS / Apple Silicon). Most C/C++/GGML and many Python backends build on Darwin too — wire the `includeDarwin` entry + `backend/index.yaml` `metal:` entries, or say in the PR why an OS is unsupported. See the darwin checklist in [.agents/adding-backends.md](.agents/adding-backends.md). - **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI diff --git a/Makefile b/Makefile index 8da9aacee..be0711b47 100644 --- a/Makefile +++ b/Makefile @@ -690,6 +690,16 @@ test-extra-backend-llama-cpp-transcription: docker-build-llama-cpp BACKEND_TEST_CTX_SIZE=2048 \ $(MAKE) test-extra-backend +## privacy-filter: the PII/NER token-classification backend. Exercises the +## TokenClassify RPC and asserts byte-correct, UTF-8-aligned span offsets +## against the openai-privacy-filter multilingual GGUF (CPU-runnable, ~50M +## active params). This is the live-backend coverage for the PII NER tier. +test-extra-backend-privacy-filter: docker-build-privacy-filter + BACKEND_IMAGE=local-ai-backend:privacy-filter \ + BACKEND_TEST_MODEL_URL=https://huggingface.co/LocalAI-io/privacy-filter-multilingual-GGUF/resolve/main/privacy-filter-multilingual-f16.gguf \ + BACKEND_TEST_CAPS=health,load,token_classify \ + $(MAKE) test-extra-backend + ## vllm is resolved from a HuggingFace model id (no file download) and ## exercises Predict + streaming + tool-call extraction via the hermes parser. ## Requires a host CPU with the SIMD instructions the prebuilt vllm CPU diff --git a/README.md b/README.md index b05af2dfb..f7843950d 100644 --- a/README.md +++ b/README.md @@ -231,6 +231,7 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native | Backend | What it does | |---------|-------------| | [parakeet.cpp](https://github.com/mudler/parakeet.cpp) | C++/GGML port of NVIDIA NeMo Parakeet ASR (tdt/ctc/rnnt/hybrid), with cache-aware streaming transcription | +| [ced.cpp](https://github.com/mudler/ced.cpp) | C++/GGML port of the CED audio-tagging models: sound-event classification (527-class AudioSet) over REST and the realtime API for live recognition | | [voxtral.c](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in pure C | | [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp) | Native port of Microsoft VibeVoice for TTS (voice cloning) and long-form ASR with speaker diarization | | [rf-detr.cpp](https://github.com/mudler/rf-detr.cpp) | Native RF-DETR object detection and instance segmentation | @@ -240,6 +241,8 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native | [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation | | [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) | +We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp. + ## Resources - [Documentation](https://localai.io/) diff --git a/backend/Dockerfile.golang b/backend/Dockerfile.golang index 75fc3a0d9..d188cdf70 100644 --- a/backend/Dockerfile.golang +++ b/backend/Dockerfile.golang @@ -65,7 +65,12 @@ RUN </dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1) ARCH?=$(shell uname -m) -# Disable Shared libs as we are linking on static gRPC and we can't mix shared and static -CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF -DLLAMA_CURL=OFF +# Shared libs default to OFF: we link static gRPC and the avx/avx2/avx512/fallback +# variants are fully static. The CPU_ALL_VARIANTS build flips SHARED_LIBS=ON (ggml/llama +# become shared so the dynamic CPU backends work; gRPC stays static via its imported +# targets). SHARED_LIBS is a make variable, not an appended -D, so it survives the +# recursive sub-make into the VARIANT build dir (which re-parses this Makefile) instead +# of being re-clobbered by a second -DBUILD_SHARED_LIBS=OFF. EXTRA_CMAKE_ARGS is the hook +# the CPU_ALL_VARIANTS target uses to inject -DGGML_BACKEND_DL/-DGGML_CPU_ALL_VARIANTS. +SHARED_LIBS?=OFF +EXTRA_CMAKE_ARGS?= +CMAKE_ARGS+=-DBUILD_SHARED_LIBS=$(SHARED_LIBS) -DLLAMA_CURL=OFF $(EXTRA_CMAKE_ARGS) CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) ifeq ($(NATIVE),false) @@ -128,6 +136,30 @@ llama-cpp-fallback: llama.cpp CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) VARIANT="llama-cpp-fallback-build" build-llama-cpp-grpc-server cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-fallback-build/grpc-server llama-cpp-fallback +# Single-build CPU backend using ggml's CPU_ALL_VARIANTS. Produces ONE grpc-server +# plus a set of dlopen-able libggml-cpu-*.so (sandybridge/haswell/skylakex/...) that +# ggml's backend registry selects from at runtime by probing host CPU features. +# Replaces the avx/avx2/avx512/fallback multi-binary build on x86. +# +# CPU_ALL_VARIANTS requires GGML_BACKEND_DL, which requires BUILD_SHARED_LIBS=ON, so we +# pass SHARED_LIBS=ON and the DL flags as make variables (NOT pre-expanded into the +# CMAKE_ARGS env string): command-line make variables propagate through every recursive +# sub-make, so the deepest VARIANT-dir build computes BUILD_SHARED_LIBS=ON consistently. +# Only ggml/llama go shared - gRPC is found via its static imported targets, so the +# grpc-server binary keeps static gRPC and only dynamically links ggml. +# +# TARGET adds "ggml": the per-microarch backends are runtime-dlopened, not link deps of +# grpc-server, so they only build because each is an add_dependencies() of the ggml target. +llama-cpp-cpu-all: llama.cpp + cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build + $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build purge + $(info ${GREEN}I llama-cpp build info:cpu-all-variants${RESET}) + $(MAKE) SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" VARIANT="llama-cpp-cpu-all-build" build-llama-cpp-grpc-server + cp -rfv $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/grpc-server llama-cpp-cpu-all + rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs + find $(CURRENT_MAKEFILE_DIR)/../llama-cpp-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \; + @echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/ + llama-cpp-grpc: llama.cpp cp -rf $(CURRENT_MAKEFILE_DIR)/../llama-cpp $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../llama-cpp-grpc-build purge diff --git a/backend/cpp/llama-cpp/grpc-server.cpp b/backend/cpp/llama-cpp/grpc-server.cpp index ceb2e8daf..e2b1f7940 100644 --- a/backend/cpp/llama-cpp/grpc-server.cpp +++ b/backend/cpp/llama-cpp/grpc-server.cpp @@ -18,6 +18,18 @@ #if __has_include("server-chat.cpp") #include "server-chat.cpp" #endif +// server-schema.cpp exists only in llama.cpp after the upstream refactor that +// extracted the JSON request-schema evaluation (previously the static +// server_task::params_from_json_cmpl) into server_schema::eval_llama_cmpl_schema. +// server-context.cpp and grpc-server.cpp both call into it, so its definitions +// must be part of this translation unit or the link fails. __has_include keeps +// the source compatible with older pins/forks (e.g. llama-cpp-turboquant) that +// predate the split and still expose params_from_json_cmpl (see the guarded +// call sites below). +#if __has_include("server-schema.cpp") +#define LOCALAI_HAS_SERVER_SCHEMA 1 +#include "server-schema.cpp" +#endif #include "server-context.cpp" // LocalAI @@ -25,6 +37,7 @@ #include "backend.pb.h" #include "backend.grpc.pb.h" #include "common.h" +#include "arg.h" #include "chat-auto-parser.h" #include #include @@ -580,6 +593,10 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt params.checkpoint_min_step = 256; #endif + // Raw upstream llama-server flags collected from any option entry that + // starts with '-'. Applied once after the loop via common_params_parse. + std::vector extra_argv; + // decode options. Options are in form optname:optvale, or if booleans only optname. for (int i = 0; i < request->options_size(); i++) { std::string opt = request->options(i); @@ -1159,6 +1176,31 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt } catch (...) {} } + // --- main model MoE on CPU (upstream --cpu-moe / --n-cpu-moe) --- + } else if (!strcmp(optname, "cpu_moe")) { + // Bool-style flag: keep all MoE expert weights on CPU. + const bool enable = (optval == NULL) || + optval_str == "true" || optval_str == "1" || optval_str == "yes" || + optval_str == "on" || optval_str == "enabled"; + if (enable) { + params.tensor_buft_overrides.push_back(llm_ffn_exps_cpu_override()); + } + } else if (!strcmp(optname, "n_cpu_moe")) { + if (optval != NULL) { + try { + int n = std::stoi(optval_str); + if (n < 0) n = 0; + // Keep override-name storage alive for the lifetime of the + // params struct (mirrors upstream arg.cpp's function-local static). + static std::list buft_overrides_main; + for (int i = 0; i < n; ++i) { + buft_overrides_main.push_back(llm_ffn_exps_block_regex(i)); + params.tensor_buft_overrides.push_back( + {buft_overrides_main.back().c_str(), ggml_backend_cpu_buffer_type()}); + } + } catch (...) {} + } + // --- draft model tensor buffer overrides (upstream --spec-draft-override-tensor) --- } else if (!strcmp(optname, "draft_override_tensor") || !strcmp(optname, "spec_draft_override_tensor")) { // Format: =,=,... @@ -1190,6 +1232,30 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt else { cur.push_back(c); } } if (!cur.empty()) flush(cur); + + // --- generic passthrough: any entry starting with '-' is a raw + // upstream llama-server flag, forwarded verbatim to the parser. --- + } else if (optname[0] == '-') { + std::string flag = optname; + // These flags make upstream's parser exit() (printing usage / + // completion), which would kill the backend process. Skip them. + if (flag == "-h" || flag == "--help" || flag == "--usage" || + flag == "--version" || flag == "--license" || + flag == "--list-devices" || flag == "-cl" || + flag == "--cache-list" || + flag.rfind("--completion", 0) == 0) { + fprintf(stderr, + "[llama-cpp] ignoring passthrough flag that would exit: %s\n", + flag.c_str()); + } else { + extra_argv.push_back(flag); + // Preserve the whole value after the first ':' so embedded + // colons (e.g. host:port) survive strtok's truncation of optval. + auto colon = opt.find(':'); + if (colon != std::string::npos) { + extra_argv.push_back(opt.substr(colon + 1)); + } + } } } @@ -1225,27 +1291,6 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt } } - if (!params.kv_overrides.empty()) { - params.kv_overrides.emplace_back(); - params.kv_overrides.back().key[0] = 0; - } - - // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp). - // Real entries are pushed during option parsing; here we pad/terminate so the - // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543) - // and so llama_params_fit has the placeholder slots it requires. - { - const size_t ntbo = llama_max_tensor_buft_overrides(); - while (params.tensor_buft_overrides.size() < ntbo) { - params.tensor_buft_overrides.push_back({nullptr, nullptr}); - } - } - // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring - // the main-model handling above. - if (!params.speculative.draft.tensor_buft_overrides.empty()) { - params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr}); - } - // TODO: Add yarn if (!request->tensorsplit().empty()) { @@ -1338,6 +1383,69 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt params.sampling.grammar_triggers.push_back(std::move(trigger)); } } + + // Apply any raw upstream flags last so an explicit passthrough flag wins + // over the LocalAI-resolved field it maps to (e.g. --ctx-size beats + // context_size). This is the same parser llama-server itself uses. + if (!extra_argv.empty()) { + // common_params_parser_init resets a few fields for the SERVER example + // (n_parallel -> -1, use_color). Snapshot n_parallel so an unrelated + // passthrough flag can't silently clobber LocalAI's resolved value. + const int saved_n_parallel = params.n_parallel; + + std::vector argv; + std::string prog = "llama-server"; + argv.push_back(prog.data()); + for (auto & a : extra_argv) { + argv.push_back(a.data()); + } + + // ctx_arg.params is a reference, so this overlays the given flags onto + // `params` in place. Returns false on a recoverable parse error (and + // self-restores params); may exit() on a hard error, exactly as + // passing the same bad flag to llama-server would. + if (!common_params_parse((int)argv.size(), argv.data(), params, + LLAMA_EXAMPLE_SERVER)) { + fprintf(stderr, + "[llama-cpp] failed to parse passthrough options; ignoring them\n"); + } + + // Restore n_parallel unless a passthrough flag explicitly set it + // (parser_init's reset sentinel for SERVER is -1). + if (params.n_parallel == -1) { + params.n_parallel = saved_n_parallel; + } + } + + // Terminate/pad the override vectors only after BOTH the named-option loop + // and the generic passthrough (common_params_parse above) have pushed their + // real entries, so back() is the null sentinel the model loader asserts on. + // Running these before the passthrough let a passthrough flag (--cpu-moe, + // --override-tensor, --override-kv, ...) append a real entry after the + // sentinel: a GGML_ASSERT crash for tensor_buft_overrides, a silent drop for + // kv_overrides. Double-termination is harmless (the while is a no-op if the + // passthrough parse already padded; an extra trailing null is ignored). + + if (!params.kv_overrides.empty()) { + params.kv_overrides.emplace_back(); + params.kv_overrides.back().key[0] = 0; + } + + // tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp). + // Real entries are pushed during option parsing; here we pad/terminate so the + // model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543) + // and so llama_params_fit has the placeholder slots it requires. + { + const size_t ntbo = llama_max_tensor_buft_overrides(); + while (params.tensor_buft_overrides.size() < ntbo) { + params.tensor_buft_overrides.push_back({nullptr, nullptr}); + } + } + // Terminate the draft tensor_buft_overrides list with a sentinel, mirroring + // the main-model handling above. + if (!params.speculative.draft.tensor_buft_overrides.empty()) { + params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr}); + } } @@ -2193,7 +2301,11 @@ public: task.index = i; task.tokens = std::move(inputs[i]); +#ifdef LOCALAI_HAS_SERVER_SCHEMA + task.params = server_schema::eval_llama_cmpl_schema( +#else task.params = server_task::params_from_json_cmpl( +#endif ctx_server.impl->vocab, params_base, ctx_server.get_meta().slot_n_ctx, @@ -2207,7 +2319,7 @@ public: // cannot detect tool calls or separate reasoning from content. task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT; task.params.oaicompat_cmpl_id = completion_id; - // oaicompat_model is already populated by params_from_json_cmpl + // oaicompat_model is already populated by eval_llama_cmpl_schema tasks.push_back(std::move(task)); } @@ -3031,7 +3143,11 @@ public: task.index = i; task.tokens = std::move(inputs[i]); +#ifdef LOCALAI_HAS_SERVER_SCHEMA + task.params = server_schema::eval_llama_cmpl_schema( +#else task.params = server_task::params_from_json_cmpl( +#endif ctx_server.impl->vocab, params_base, ctx_server.get_meta().slot_n_ctx, @@ -3043,7 +3159,7 @@ public: // reasoning, tool calls, and content are classified into ChatDeltas. task.params.res_type = TASK_RESPONSE_TYPE_OAI_CHAT; task.params.oaicompat_cmpl_id = completion_id; - // oaicompat_model is already populated by params_from_json_cmpl + // oaicompat_model is already populated by eval_llama_cmpl_schema tasks.push_back(std::move(task)); } diff --git a/backend/cpp/llama-cpp/package.sh b/backend/cpp/llama-cpp/package.sh index d1897e6be..5d2b18c5b 100755 --- a/backend/cpp/llama-cpp/package.sh +++ b/backend/cpp/llama-cpp/package.sh @@ -14,6 +14,22 @@ mkdir -p $CURDIR/package/lib cp -avrf $CURDIR/llama-cpp-* $CURDIR/package/ cp -rfv $CURDIR/run.sh $CURDIR/package/ +# Bundle the ggml shared backends produced by the CPU_ALL_VARIANTS build (libggml-base.so, +# libggml.so, libllama.so and the per-microarch libggml-cpu-*.so), all into package/lib. +# +# Two distinct resolution mechanisms both land here: +# - NEEDED deps (libggml-base/libggml/libllama): resolved by the dynamic linker via the +# LD_LIBRARY_PATH=$CURDIR/lib that run.sh exports. +# - The per-microarch libggml-cpu-*.so are NOT linked; ggml *discovers* them at runtime by +# scanning the executable's own directory (readlink /proc/self/exe). run.sh launches via +# the bundled $CURDIR/lib/ld.so, so /proc/self/exe -> .../lib/ld.so and ggml scans lib/. +# That is why the variants must sit in lib/ (next to ld.so), not just on the link path. +# No-op on builds (arm64/darwin) that don't produce the all-variants set. +if [ -d "$CURDIR/ggml-shared-libs" ]; then + echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..." + cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/ +fi + # Detect architecture and copy appropriate libraries if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then # x86_64 architecture diff --git a/backend/cpp/llama-cpp/run.sh b/backend/cpp/llama-cpp/run.sh index 553faeb27..db8498f4b 100755 --- a/backend/cpp/llama-cpp/run.sh +++ b/backend/cpp/llama-cpp/run.sh @@ -12,26 +12,12 @@ grep -e "flags" /proc/cpuinfo | head -1 BINARY=llama-cpp-fallback -if grep -q -e "\savx\s" /proc/cpuinfo ; then - echo "CPU: AVX found OK" - if [ -e $CURDIR/llama-cpp-avx ]; then - BINARY=llama-cpp-avx - fi -fi - -if grep -q -e "\savx2\s" /proc/cpuinfo ; then - echo "CPU: AVX2 found OK" - if [ -e $CURDIR/llama-cpp-avx2 ]; then - BINARY=llama-cpp-avx2 - fi -fi - -# Check avx 512 -if grep -q -e "\savx512f\s" /proc/cpuinfo ; then - echo "CPU: AVX512F found OK" - if [ -e $CURDIR/llama-cpp-avx512 ]; then - BINARY=llama-cpp-avx512 - fi +# CPU images (x86, arm64, darwin) ship a single llama-cpp-cpu-all built with ggml +# CPU_ALL_VARIANTS: ggml's backend registry dlopens the best libggml-cpu-*.so for this +# host, so no shell-side AVX probing. GPU images (cublas/sycl/vulkan/hipblas) ship only +# llama-cpp-fallback (the accelerator does the compute), so fall back to it when absent. +if [ -e $CURDIR/llama-cpp-cpu-all ]; then + BINARY=llama-cpp-cpu-all fi if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then diff --git a/backend/cpp/privacy-filter/Makefile b/backend/cpp/privacy-filter/Makefile index 173d4176b..774f2c433 100644 --- a/backend/cpp/privacy-filter/Makefile +++ b/backend/cpp/privacy-filter/Makefile @@ -8,7 +8,7 @@ # Local development: point at a working checkout instead of cloning, e.g. # make PRIVACY_FILTER_SRC=$HOME/c/privacy-filter.cpp grpc-server -PRIVACY_FILTER_VERSION?=646342f7a59c6b7d195185eac60bad762e572f1d +PRIVACY_FILTER_VERSION?=98f52c5ef2250f207cc6b9a6aef05393a120cb7c PRIVACY_FILTER_REPO?=https://github.com/localai-org/privacy-filter.cpp PRIVACY_FILTER_SRC?= diff --git a/backend/cpp/turboquant/Makefile b/backend/cpp/turboquant/Makefile index 98f5e4978..a32adf0b6 100644 --- a/backend/cpp/turboquant/Makefile +++ b/backend/cpp/turboquant/Makefile @@ -65,6 +65,29 @@ turboquant-avx: turboquant-fallback: $(call turboquant-build,fallback,-DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server) +# Single-build CPU backend via ggml CPU_ALL_VARIANTS (mirrors llama-cpp-cpu-all). +# turboquant reuses backend/cpp/llama-cpp's CMakeLists.txt (hw_grpc_proto STATIC) and +# Makefile (SHARED_LIBS make-var + EXTRA_CMAKE_ARGS), so this passes the same overrides +# through to the copied build: SHARED_LIBS=ON, the DL flags, and --target ggml (which +# pulls in the per-microarch libggml-cpu-*.so via ggml's add_dependencies). The .so set +# is collected for package.sh to bundle into package/lib. +turboquant-cpu-all: + rm -rf $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build + cp -rf $(LLAMA_CPP_DIR) $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build + $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build purge + bash $(CURRENT_MAKEFILE_DIR)/patch-grpc-server.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server.cpp + $(info $(GREEN)I turboquant build info:cpu-all-variants$(RESET)) + LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \ + $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build llama.cpp + bash $(CURRENT_MAKEFILE_DIR)/apply-patches.sh $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp $(PATCHES_DIR) + SHARED_LIBS=ON EXTRA_CMAKE_ARGS="-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON" TARGET="--target grpc-server --target ggml" \ + LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(TURBOQUANT_VERSION) \ + $(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build grpc-server + cp -rfv $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/grpc-server turboquant-cpu-all + rm -rf ggml-shared-libs && mkdir -p ggml-shared-libs + find $(CURRENT_MAKEFILE_DIR)/../turboquant-cpu-all-build/llama.cpp/build \( -name '*.so*' -o -name '*.dylib' \) -exec cp -av {} ggml-shared-libs/ \; + @echo "Collected ggml shared backends:" && ls -la ggml-shared-libs/ + turboquant-grpc: $(call turboquant-build,grpc,-DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server --target rpc-server) diff --git a/backend/cpp/turboquant/package.sh b/backend/cpp/turboquant/package.sh index d5402fc31..c4559a68d 100755 --- a/backend/cpp/turboquant/package.sh +++ b/backend/cpp/turboquant/package.sh @@ -14,6 +14,15 @@ mkdir -p $CURDIR/package/lib cp -avrf $CURDIR/turboquant-* $CURDIR/package/ cp -rfv $CURDIR/run.sh $CURDIR/package/ +# Bundle the ggml shared backends from the CPU_ALL_VARIANTS build into package/lib. ggml +# discovers the per-microarch libggml-cpu-*.so by scanning the executable directory, which +# (via the bundled lib/ld.so that run.sh launches through) resolves to lib/. See the +# matching comment in backend/cpp/llama-cpp/package.sh. No-op on the fallback/ROCm builds. +if [ -d "$CURDIR/ggml-shared-libs" ]; then + echo "Bundling ggml shared backends (CPU_ALL_VARIANTS)..." + cp -avf $CURDIR/ggml-shared-libs/*.so* $CURDIR/package/lib/ +fi + # Detect architecture and copy appropriate libraries if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then # x86_64 architecture diff --git a/backend/cpp/turboquant/run.sh b/backend/cpp/turboquant/run.sh index b0239e237..cd41a0f7f 100755 --- a/backend/cpp/turboquant/run.sh +++ b/backend/cpp/turboquant/run.sh @@ -12,26 +12,11 @@ grep -e "flags" /proc/cpuinfo | head -1 BINARY=turboquant-fallback -if grep -q -e "\savx\s" /proc/cpuinfo ; then - echo "CPU: AVX found OK" - if [ -e $CURDIR/turboquant-avx ]; then - BINARY=turboquant-avx - fi -fi - -if grep -q -e "\savx2\s" /proc/cpuinfo ; then - echo "CPU: AVX2 found OK" - if [ -e $CURDIR/turboquant-avx2 ]; then - BINARY=turboquant-avx2 - fi -fi - -# Check avx 512 -if grep -q -e "\savx512f\s" /proc/cpuinfo ; then - echo "CPU: AVX512F found OK" - if [ -e $CURDIR/turboquant-avx512 ]; then - BINARY=turboquant-avx512 - fi +# x86/arm64 ship a single turboquant-cpu-all built with ggml CPU_ALL_VARIANTS: ggml's +# backend registry dlopens the best libggml-cpu-*.so for this host, so no shell-side +# probing. ROCm ships only turboquant-fallback, so fall back to it when cpu-all is absent. +if [ -e $CURDIR/turboquant-cpu-all ]; then + BINARY=turboquant-cpu-all fi if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then diff --git a/backend/go/acestep-cpp/Makefile b/backend/go/acestep-cpp/Makefile index 0b1929b94..3332ce1b6 100644 --- a/backend/go/acestep-cpp/Makefile +++ b/backend/go/acestep-cpp/Makefile @@ -117,7 +117,8 @@ libgoacestepcpp-custom: CMakeLists.txt cpp/goacestepcpp.cpp cpp/goacestepcpp.h cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) --target goacestepcpp && \ cd .. && \ - mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgoacestepcpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgoacestepcpp.dylib ./$(SO_TARGET) 2>/dev/null) test: acestep-cpp @echo "Running acestep-cpp tests..." diff --git a/backend/go/acestep-cpp/main.go b/backend/go/acestep-cpp/main.go index c65afb335..e4c1378b8 100644 --- a/backend/go/acestep-cpp/main.go +++ b/backend/go/acestep-cpp/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -22,7 +23,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("ACESTEP_LIBRARY") if libName == "" { - libName = "./libgoacestepcpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgoacestepcpp-fallback.dylib" + } else { + libName = "./libgoacestepcpp-fallback.so" + } } gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/acestep-cpp/package.sh b/backend/go/acestep-cpp/package.sh index d922c5b86..5fecf3455 100755 --- a/backend/go/acestep-cpp/package.sh +++ b/backend/go/acestep-cpp/package.sh @@ -13,6 +13,7 @@ mkdir -p $CURDIR/package/lib cp -avf $CURDIR/acestep-cpp $CURDIR/package/ cp -fv $CURDIR/libgoacestepcpp-*.so $CURDIR/package/ +cp -fv $CURDIR/libgoacestepcpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/acestep-cpp/run.sh b/backend/go/acestep-cpp/run.sh index d901e2c85..bcdfbc09e 100755 --- a/backend/go/acestep-cpp/run.sh +++ b/backend/go/acestep-cpp/run.sh @@ -12,9 +12,19 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgoacestepcpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single library variant (Metal or Accelerate). The goacestepcpp + # target is built as a CMake MODULE, which emits a .dylib for a SHARED + # build but a .so for a MODULE build on Apple, so prefer .dylib and fall + # back to .so. + LIBRARY="$CURDIR/libgoacestepcpp-fallback.dylib" + if [ ! -e "$LIBRARY" ]; then + LIBRARY="$CURDIR/libgoacestepcpp-fallback.so" + fi + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgoacestepcpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgoacestepcpp-avx.so ]; then @@ -36,9 +46,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgoacestepcpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export ACESTEP_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/ced/.gitignore b/backend/go/ced/.gitignore new file mode 100644 index 000000000..5e47da6c5 --- /dev/null +++ b/backend/go/ced/.gitignore @@ -0,0 +1,11 @@ +.cache/ +sources/ +build/ +package/ +ced-grpc +# build artifacts staged in-tree by the Makefile (cp from sources/) or +# symlinked for local dev; the real sources live in ced.cpp upstream. +*.so +*.so.* +ced_capi.h +compile_commands.json diff --git a/backend/go/ced/Makefile b/backend/go/ced/Makefile new file mode 100644 index 000000000..2b15990ec --- /dev/null +++ b/backend/go/ced/Makefile @@ -0,0 +1,78 @@ +# ced sound-classification backend Makefile. +# +# Upstream pin lives below as CED_VERSION?= so .github/bump_deps.sh can find +# and update it (matches the parakeet-cpp / whisper.cpp convention). +# +# Local dev shortcut: symlink an out-of-tree ced.cpp shared build + header and +# skip the clone/cmake steps entirely: +# ln -sf /path/to/ced.cpp/build-shared/libced.so . +# ln -sf /path/to/ced.cpp/include/ced_capi.h . +# go build -o ced-grpc . + +CED_VERSION?=c04ac14b7992d00584d9e812c9bb6268598a6ce7 +CED_REPO?=https://github.com/mudler/ced.cpp + +GOCMD?=go +GO_TAGS?= +JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4) + +BUILD_TYPE?= +NATIVE?=false + +# Static-link ggml into libced.so (PIC) so the shared lib is self-contained: +# dlopen needs no libggml*.so alongside it, only system libs the runtime image +# already provides. +CMAKE_ARGS?=-DCMAKE_BUILD_TYPE=Release -DCED_SHARED=ON -DCED_BUILD_CLI=OFF -DCED_BUILD_TESTS=OFF -DBUILD_SHARED_LIBS=OFF -DCMAKE_POSITION_INDEPENDENT_CODE=ON + +ifeq ($(NATIVE),false) + CMAKE_ARGS+=-DGGML_NATIVE=OFF +endif + +# ced.cpp gates its ggml backends behind CED_GGML_* options (set(... CACHE BOOL +# "" FORCE)), so forward those instead of a bare -DGGML_CUDA=ON. +ifeq ($(BUILD_TYPE),cublas) + CMAKE_ARGS+=-DCED_GGML_CUDA=ON -DGGML_CUDA_GRAPHS=ON +else ifeq ($(BUILD_TYPE),openblas) + CMAKE_ARGS+=-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS +else ifeq ($(BUILD_TYPE),hipblas) + CMAKE_ARGS+=-DCED_GGML_HIP=ON +else ifeq ($(BUILD_TYPE),vulkan) + CMAKE_ARGS+=-DCED_GGML_VULKAN=ON +endif + +.PHONY: ced-grpc package build clean purge test all + +all: ced-grpc + +sources/ced.cpp: + mkdir -p sources/ced.cpp + cd sources/ced.cpp && \ + git init -q && \ + git remote add origin $(CED_REPO) && \ + git fetch --depth 1 origin $(CED_VERSION) && \ + git checkout FETCH_HEAD && \ + git submodule update --init --recursive --depth 1 --single-branch + +libced.so: sources/ced.cpp + cmake -B sources/ced.cpp/build-shared -S sources/ced.cpp $(CMAKE_ARGS) + cmake --build sources/ced.cpp/build-shared --config Release -j$(JOBS) + cp -fv sources/ced.cpp/build-shared/libced.so* ./ 2>/dev/null || true + cp -fv sources/ced.cpp/build-shared/libced.dylib ./ 2>/dev/null || true + cp -fv sources/ced.cpp/include/ced_capi.h ./ + +ced-grpc: libced.so main.go goced.go + CGO_ENABLED=0 $(GOCMD) build -tags "$(GO_TAGS)" -o ced-grpc . + +package: ced-grpc + bash package.sh + +build: package + +test: + LD_LIBRARY_PATH=$(CURDIR):$$LD_LIBRARY_PATH $(GOCMD) test ./... -count=1 + +clean: purge + rm -rf libced.so* ced_capi.h package ced-grpc + +purge: + rm -rf sources/ced.cpp diff --git a/backend/go/ced/goced.go b/backend/go/ced/goced.go new file mode 100644 index 000000000..a405bf017 --- /dev/null +++ b/backend/go/ced/goced.go @@ -0,0 +1,130 @@ +package main + +// Go side of the ced backend: purego bindings over ced_capi.h plus the gRPC +// SoundDetection implementation. +// +// SKETCH: the pb.SoundDetection* types come from backend.proto (regenerate with +// `make protogen-go`). The C side is single-threaded per ctx, so we guard the +// engine with engineMu; LocalAI also serializes via base.SingleThread. +import ( + "context" + "encoding/json" + "errors" + "fmt" + "sort" + "sync" + "unsafe" + + "github.com/mudler/LocalAI/pkg/grpc/base" + pb "github.com/mudler/LocalAI/pkg/grpc/proto" +) + +// purego-bound entry points from libced.so. Names match ced_capi.h exactly. +var ( + CppAbiVersion func() int32 + CppLoad func(ggufPath string) uintptr + CppFree func(ctx uintptr) + CppLastError func(ctx uintptr) string + CppNumClasses func(ctx uintptr) int32 + CppSampleRate func(ctx uintptr) int32 + CppClassifyPathJSON func(ctx uintptr, wavPath string, topK int32) uintptr + CppClassifyPcmJSON func(ctx uintptr, pcm []float32, nSamples int32, sampleRate int32, topK int32) uintptr + CppFreeString func(s uintptr) +) + +// cstr copies a malloc'd C string (returned as uintptr) into a Go string and +// frees the original via ced_capi_free_string. Empty/0 -> "". +func cstr(p uintptr) string { + if p == 0 { + return "" + } + defer CppFreeString(p) + var b []byte + for i := 0; ; i++ { + ch := *(*byte)(unsafe.Pointer(p + uintptr(i))) //nolint:govet // #nosec G103 -- C-owned NUL-terminated string from libced (not Go-GC memory) + if ch == 0 { + break + } + b = append(b, ch) + } + return string(b) +} + +// Ced is the gRPC backend. One loaded CED model per instance. +type Ced struct { + base.Base + ctxPtr uintptr + engineMu sync.Mutex +} + +// Load resolves the GGUF and opens the C-API context. +func (c *Ced) Load(opts *pb.ModelOptions) error { + if opts.ModelFile == "" { + return errors.New("ced: ModelFile is required") + } + ctx := CppLoad(opts.ModelFile) + if ctx == 0 { + return fmt.Errorf("ced: ced_capi_load failed for %q: %s", opts.ModelFile, CppLastError(0)) + } + c.ctxPtr = ctx + return nil +} + +// jsonTag mirrors the ced_capi JSON tag objects. +type jsonTag struct { + Index int `json:"index"` + Score float32 `json:"score"` + Label string `json:"label"` +} + +// SoundDetection classifies the clip at req.Src and returns scored AudioSet tags. +func (c *Ced) SoundDetection(ctx context.Context, req *pb.SoundDetectionRequest) (*pb.SoundDetectionResponse, error) { + if c.ctxPtr == 0 { + return nil, errors.New("ced: model not loaded") + } + if req.GetSrc() == "" { + return nil, errors.New("ced: SoundDetectionRequest.src (audio path) is required") + } + topK := req.GetTopK() + if topK <= 0 { + topK = 10 // sensible default for a tagging response + } + + c.engineMu.Lock() + out := cstr(CppClassifyPathJSON(c.ctxPtr, req.GetSrc(), topK)) + lastErr := CppLastError(c.ctxPtr) + c.engineMu.Unlock() + + if out == "" { + return nil, fmt.Errorf("ced: classification failed: %s", lastErr) + } + var tags []jsonTag + if err := json.Unmarshal([]byte(out), &tags); err != nil { + return nil, fmt.Errorf("ced: bad classifier JSON: %w", err) + } + + thr := req.GetThreshold() + resp := &pb.SoundDetectionResponse{} + for _, t := range tags { + if t.Score < thr { + continue + } + resp.Detections = append(resp.Detections, &pb.SoundClass{ + Label: t.Label, Score: t.Score, Index: int32(t.Index), + }) + } + sort.Slice(resp.Detections, func(i, j int) bool { + return resp.Detections[i].Score > resp.Detections[j].Score + }) + return resp, nil +} + +func (c *Ced) Free() error { + c.engineMu.Lock() + defer c.engineMu.Unlock() + if c.ctxPtr != 0 { + CppFree(c.ctxPtr) + c.ctxPtr = 0 + } + return nil +} diff --git a/backend/go/ced/main.go b/backend/go/ced/main.go new file mode 100644 index 000000000..b6c93a9f9 --- /dev/null +++ b/backend/go/ced/main.go @@ -0,0 +1,64 @@ +package main + +// ced sound-classification backend. Started internally by LocalAI: one gRPC +// server per loaded model. Loads libced.so via purego and registers the flat +// C-API declared in ced_capi.h. The library name can be overridden with +// CED_LIBRARY (mirrors PARAKEET_LIBRARY / WHISPER_LIBRARY); the default looks +// for the .so next to this binary. +// +// SKETCH: requires `make protogen-go` after the backend.proto SoundDetection +// addition, and a built libced.so (see Makefile). See DESIGN.md. +import ( + "flag" + "fmt" + "os" + "runtime" + + "github.com/ebitengine/purego" + grpc "github.com/mudler/LocalAI/pkg/grpc" +) + +var addr = flag.String("addr", "localhost:50051", "the address to connect to") + +type libFunc struct { + ptr any + name string +} + +func main() { + libName := os.Getenv("CED_LIBRARY") + if libName == "" { + if runtime.GOOS == "darwin" { + libName = "libced.dylib" + } else { + libName = "libced.so" + } + } + lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) + if err != nil { + panic(fmt.Errorf("ced: dlopen %q: %w", libName, err)) + } + + // Bound 1:1 to ced_capi.h. char*-returning functions are declared uintptr + // so we can free the same pointer with ced_capi_free_string after copying + // (purego's string return would copy and leak the original). + for _, lf := range []libFunc{ + {&CppAbiVersion, "ced_capi_abi_version"}, + {&CppLoad, "ced_capi_load"}, + {&CppFree, "ced_capi_free"}, + {&CppLastError, "ced_capi_last_error"}, + {&CppNumClasses, "ced_capi_num_classes"}, + {&CppSampleRate, "ced_capi_sample_rate"}, + {&CppClassifyPathJSON, "ced_capi_classify_path_json"}, + {&CppClassifyPcmJSON, "ced_capi_classify_pcm_json"}, + {&CppFreeString, "ced_capi_free_string"}, + } { + purego.RegisterLibFunc(lf.ptr, lib, lf.name) + } + + fmt.Fprintf(os.Stderr, "[ced] ABI=%d\n", CppAbiVersion()) + flag.Parse() + if err := grpc.StartServer(*addr, &Ced{}); err != nil { + panic(err) + } +} diff --git a/backend/go/ced/package.sh b/backend/go/ced/package.sh new file mode 100755 index 000000000..ff20d727f --- /dev/null +++ b/backend/go/ced/package.sh @@ -0,0 +1,62 @@ +#!/bin/bash +# +# Bundle the ced-grpc binary, libced.so, the core runtime libs (libc/libstdc++/ +# libgomp + ld.so) and the GPU runtime for the active BUILD_TYPE so the package +# is self-contained. Mirrors backend/go/parakeet-cpp/package.sh; run.sh routes +# the (CGO_ENABLED=0) binary through lib/ld.so so the packaged libc is used. + +set -e + +CURDIR=$(dirname "$(realpath "$0")") +REPO_ROOT="${CURDIR}/../../.." + +mkdir -p "$CURDIR/package/lib" + +cp -avf "$CURDIR/ced-grpc" "$CURDIR/package/" +cp -avf "$CURDIR/run.sh" "$CURDIR/package/" + +cp -avf "$CURDIR"/libced.so* "$CURDIR/package/lib/" 2>/dev/null || true +cp -avf "$CURDIR"/libced.dylib "$CURDIR/package/lib/" 2>/dev/null || true +if ! ls "$CURDIR"/package/lib/libced.* >/dev/null 2>&1; then + echo "ERROR: libced shared library not found in $CURDIR, run 'make' first" >&2 + exit 1 +fi + +if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then + echo "Detected x86_64 architecture, copying x86_64 libraries..." + cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so" + cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2" + cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0" +elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then + echo "Detected ARM64 architecture, copying ARM64 libraries..." + cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so" + cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2" + cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0" +elif [ "$(uname -s)" = "Darwin" ]; then + echo "Detected Darwin" +else + echo "Error: Could not detect architecture" + exit 1 +fi + +GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh" +if [ -f "$GPU_LIB_SCRIPT" ]; then + echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..." + source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib" + package_gpu_libs +fi + +echo "Packaging completed successfully" +ls -liah "$CURDIR/package/" "$CURDIR/package/lib/" diff --git a/backend/go/ced/run.sh b/backend/go/ced/run.sh new file mode 100755 index 000000000..1f95f748f --- /dev/null +++ b/backend/go/ced/run.sh @@ -0,0 +1,20 @@ +#!/bin/bash +set -e + +CURDIR=$(dirname "$(realpath "$0")") + +if [ "$(uname)" = "Darwin" ]; then + export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}" + export CED_LIBRARY="$CURDIR/lib/libced.dylib" +else + export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}" +fi + +# If a self-contained ld.so was packaged, route through it so the packaged +# libc / libstdc++ are used instead of the host's (matches the sibling backends). +if [ -f "$CURDIR/lib/ld.so" ]; then + echo "Using lib/ld.so" + exec "$CURDIR/lib/ld.so" "$CURDIR/ced-grpc" "$@" +fi + +exec "$CURDIR/ced-grpc" "$@" diff --git a/backend/go/crispasr/Makefile b/backend/go/crispasr/Makefile index 42a7a7555..1b32240e3 100644 --- a/backend/go/crispasr/Makefile +++ b/backend/go/crispasr/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # CrispASR version (release tag) CRISPASR_REPO?=https://github.com/CrispStrobe/CrispASR -CRISPASR_VERSION?=d745bda4386ae0f9d1d2f23fff8ec95d76428221 +CRISPASR_VERSION?=96b2a6ee31d30389fed8a7ef1a54239b75231ddc SO_TARGET?=libgocrispasr.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF @@ -67,7 +67,7 @@ sources/CrispASR: # it, so ${CMAKE_SOURCE_DIR} is THIS backend dir and the talk-llama sources # aren't found. Rewrite to ${PROJECT_SOURCE_DIR} (the crispasr project root), # which is correct both standalone and as a subproject. Idempotent. - sed -i 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt + sed -i.bak 's#\$${CMAKE_SOURCE_DIR}/examples/talk-llama#\$${PROJECT_SOURCE_DIR}/examples/talk-llama#' sources/CrispASR/src/CMakeLists.txt && rm -f sources/CrispASR/src/CMakeLists.txt.bak # Detect OS UNAME_S := $(shell uname -s) @@ -75,7 +75,8 @@ UNAME_S := $(shell uname -s) ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libgocrispasr-avx.so libgocrispasr-avx2.so libgocrispasr-avx512.so libgocrispasr-fallback.so else - VARIANT_TARGETS = libgocrispasr-fallback.so + # On non-Linux (e.g., Darwin), build only fallback variant (as a dylib) + VARIANT_TARGETS = libgocrispasr-fallback.dylib endif crispasr: main.go gocrispasr.go $(VARIANT_TARGETS) @@ -87,7 +88,7 @@ package: crispasr build: package clean: purge - rm -rf libgocrispasr*.so package sources/CrispASR crispasr + rm -rf libgocrispasr*.so libgocrispasr*.dylib package sources/CrispASR crispasr purge: rm -rf build* @@ -118,13 +119,21 @@ libgocrispasr-fallback.so: sources/CrispASR SO_TARGET=libgocrispasr-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom rm -rfv build* +# Build fallback variant as a dylib (Darwin) +libgocrispasr-fallback.dylib: sources/CrispASR + $(MAKE) purge + $(info ${GREEN}I crispasr build info:fallback (dylib)${RESET}) + SO_TARGET=libgocrispasr-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgocrispasr-custom + rm -rfv build* + libgocrispasr-custom: CMakeLists.txt cpp/crispasr_shim.cpp cpp/crispasr_shim.h mkdir -p build-$(SO_TARGET) && \ cd build-$(SO_TARGET) && \ cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgocrispasr.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgocrispasr.dylib ./$(SO_TARGET) 2>/dev/null) test: crispasr CGO_ENABLED=0 $(GOCMD) test -v ./... diff --git a/backend/go/crispasr/cpp/crispasr_shim.cpp b/backend/go/crispasr/cpp/crispasr_shim.cpp index bf6151ae1..60dbfd86b 100644 --- a/backend/go/crispasr/cpp/crispasr_shim.cpp +++ b/backend/go/crispasr/cpp/crispasr_shim.cpp @@ -47,6 +47,74 @@ extern "C" void set_abort(int v) { g_abort.store(v, std::memory_order_relaxed); } +// --- word-level timestamp accessors --- +extern "C" { +int crispasr_session_result_n_words(crispasr_session_result *r, int seg_i); +const char *crispasr_session_result_word_text(crispasr_session_result *r, + int seg_i, int word_i); +int64_t crispasr_session_result_word_t0(crispasr_session_result *r, int seg_i, + int word_i); +int64_t crispasr_session_result_word_t1(crispasr_session_result *r, int seg_i, + int word_i); + +// Parakeet-specific word accessors +int crispasr_parakeet_result_n_words(void *r); +const char *crispasr_parakeet_result_word_text(void *r, int word_i); +int64_t crispasr_parakeet_result_word_t0(void *r, int word_i); +int64_t crispasr_parakeet_result_word_t1(void *r, int word_i); +} + +void *get_result(void) { return g_result; } + +int get_word_count(int seg_i) { + if (!g_result) + return 0; + return crispasr_session_result_n_words(g_result, seg_i); +} + +const char *get_word_text(int seg_i, int word_i) { + if (!g_result) + return ""; + return crispasr_session_result_word_text(g_result, seg_i, word_i); +} + +int64_t get_word_t0(int seg_i, int word_i) { + if (!g_result) + return 0; + return crispasr_session_result_word_t0(g_result, seg_i, word_i); +} + +int64_t get_word_t1(int seg_i, int word_i) { + if (!g_result) + return 0; + return crispasr_session_result_word_t1(g_result, seg_i, word_i); +} + +// Parakeet-specific word accessors +int get_parakeet_word_count(void) { + if (!g_result) + return 0; + return crispasr_parakeet_result_n_words(g_result); +} + +const char *get_parakeet_word_text(int word_i) { + if (!g_result) + return ""; + return crispasr_parakeet_result_word_text(g_result, word_i); +} + +int64_t get_parakeet_word_t0(int word_i) { + if (!g_result) + return 0; + return crispasr_parakeet_result_word_t0(g_result, word_i); +} + +int64_t get_parakeet_word_t1(int word_i) { + if (!g_result) + return 0; + return crispasr_parakeet_result_word_t1(g_result, word_i); +} + static void ggml_log_cb(enum ggml_log_level level, const char *log, void *data) { const char *level_str; diff --git a/backend/go/crispasr/cpp/crispasr_shim.h b/backend/go/crispasr/cpp/crispasr_shim.h index 7c593951a..c7baa41f4 100644 --- a/backend/go/crispasr/cpp/crispasr_shim.h +++ b/backend/go/crispasr/cpp/crispasr_shim.h @@ -20,4 +20,18 @@ float *tts_synthesize(const char *text, int *out_n_samples); // 24kHz mono float void tts_free(float *pcm); int tts_set_voice(const char *name); // best-effort speaker selection; 0 ok int tts_set_voice_file(const char *path, const char *ref_text); // load voice pack (.gguf) or zero-shot clone (.wav + ref_text) + +// --- word-level timestamp accessors --- +// Session-based (works for whisper-like backends) +void *get_result(void); +int get_word_count(int seg_i); +const char *get_word_text(int seg_i, int word_i); +int64_t get_word_t0(int seg_i, int word_i); +int64_t get_word_t1(int seg_i, int word_i); + +// Parakeet-specific (global word list, no segment index) +int get_parakeet_word_count(void); +const char *get_parakeet_word_text(int word_i); +int64_t get_parakeet_word_t0(int word_i); +int64_t get_parakeet_word_t1(int word_i); } diff --git a/backend/go/crispasr/gocrispasr.go b/backend/go/crispasr/gocrispasr.go index 5c3528d38..2cbfb0d4a 100644 --- a/backend/go/crispasr/gocrispasr.go +++ b/backend/go/crispasr/gocrispasr.go @@ -34,6 +34,18 @@ var ( CppTTSFree func(ptr uintptr) CppTTSSetVoice func(name string) int CppTTSSetVoiceFile func(path string, refText string) int + + // Word-level timestamp accessors (session-based, per-segment) + CppGetWordCount func(segI int) int + CppGetWordText func(segI int, wordI int) string + CppGetWordT0 func(segI int, wordI int) int64 + CppGetWordT1 func(segI int, wordI int) int64 + + // Parakeet-specific word accessors (global, no segment index) + CppGetParakeetWordCount func() int + CppGetParakeetWordText func(wordI int) string + CppGetParakeetWordT0 func(wordI int) int64 + CppGetParakeetWordT1 func(wordI int) int64 ) type CrispASR struct { @@ -212,6 +224,28 @@ func (w *CrispASR) VAD(req *pb.VADRequest) (pb.VADResponse, error) { }, nil } +// isValidWord reports whether a TranscriptWord contains recognisable speech +// content. The parakeet-specific word accessors can return stale initialisation +// data (model name, binary blobs) when a segment has no real speech. A word is +// considered valid only when: +// - the text is non-empty after trimming, +// - it contains no U+FFFD replacement characters (from binary data scrubbing), +// - both timestamps are non-negative, +// - the word has positive duration (end > start). +func isValidWord(w *pb.TranscriptWord) bool { + txt := strings.TrimSpace(w.Text) + if txt == "" { + return false + } + if strings.ContainsRune(txt, '\uFFFD') { + return false + } + if w.Start < 0 || w.End < 0 || w.End <= w.Start { + return false + } + return true +} + func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRequest) (pb.TranscriptResult, error) { if err := ctx.Err(); err != nil { return pb.TranscriptResult{}, status.Error(codes.Canceled, "transcription cancelled") @@ -290,15 +324,54 @@ func (w *CrispASR) AudioTranscription(ctx context.Context, opts *pb.TranscriptRe // IDs, so Tokens is left empty. txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "�") + // Populate word-level timestamps. Try session-based functions first + // (per-segment); fall back to parakeet-specific functions (global word + // list with no segment index — only populated on the first segment to + // avoid duplication). + words := []*pb.TranscriptWord{} + wordCount := CppGetWordCount(i) + if wordCount == 0 && i == 0 { + wordCount = CppGetParakeetWordCount() + for j := 0; j < wordCount; j++ { + w := &pb.TranscriptWord{ + Start: CppGetParakeetWordT0(j) * (10000000), + End: CppGetParakeetWordT1(j) * (10000000), + Text: strings.ToValidUTF8(strings.Clone(CppGetParakeetWordText(j)), "�"), + } + if isValidWord(w) { + words = append(words, w) + } + } + } else { + for j := 0; j < wordCount; j++ { + w := &pb.TranscriptWord{ + Start: CppGetWordT0(i, j) * (10000000), + End: CppGetWordT1(i, j) * (10000000), + Text: strings.ToValidUTF8(strings.Clone(CppGetWordText(i, j)), "�"), + } + if isValidWord(w) { + words = append(words, w) + } + } + } + + // Skip empty segments with no recognisable content (e.g. trailing + // silence segments that parakeet emits with stale init data). + trimmed := strings.TrimSpace(txt) + if trimmed == "" && len(words) == 0 { + continue + } + segment := &pb.TranscriptSegment{ Id: int32(i), Text: txt, Start: s, End: t, + Words: words, } segments = append(segments, segment) - text += " " + strings.TrimSpace(txt) + text += " " + trimmed } return pb.TranscriptResult{ @@ -390,13 +463,20 @@ func (w *CrispASR) AudioTranscriptionStream(ctx context.Context, opts *pb.Transc s := CppGetSegmentStart(i) * 10000000 t := CppGetSegmentEnd(i) * 10000000 txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "�") + + // Skip empty segments (e.g. trailing silence that parakeet emits + // with stale init data). + trimmed := strings.TrimSpace(txt) + if trimmed == "" && s == t { + continue + } + segments = append(segments, &pb.TranscriptSegment{ Id: int32(i), Text: txt, Start: s, End: t, }) - trimmed := strings.TrimSpace(txt) if trimmed == "" { continue } diff --git a/backend/go/crispasr/main.go b/backend/go/crispasr/main.go index c2069bd85..a1f132cc5 100644 --- a/backend/go/crispasr/main.go +++ b/backend/go/crispasr/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("CRISPASR_LIBRARY") if libName == "" { - libName = "./libgocrispasr-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgocrispasr-fallback.dylib" + } else { + libName = "./libgocrispasr-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) @@ -44,6 +49,14 @@ func main() { {&CppTTSFree, "tts_free"}, {&CppTTSSetVoice, "tts_set_voice"}, {&CppTTSSetVoiceFile, "tts_set_voice_file"}, + {&CppGetWordCount, "get_word_count"}, + {&CppGetWordText, "get_word_text"}, + {&CppGetWordT0, "get_word_t0"}, + {&CppGetWordT1, "get_word_t1"}, + {&CppGetParakeetWordCount, "get_parakeet_word_count"}, + {&CppGetParakeetWordText, "get_parakeet_word_text"}, + {&CppGetParakeetWordT0, "get_parakeet_word_t0"}, + {&CppGetParakeetWordT1, "get_parakeet_word_t1"}, } for _, lf := range libFuncs { diff --git a/backend/go/crispasr/package.sh b/backend/go/crispasr/package.sh index baee12944..9b89dad1b 100755 --- a/backend/go/crispasr/package.sh +++ b/backend/go/crispasr/package.sh @@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/crispasr $CURDIR/package/ -cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ +cp -fv $CURDIR/libgocrispasr-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgocrispasr-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/crispasr/run.sh b/backend/go/crispasr/run.sh index ccb264833..6d3c4b216 100755 --- a/backend/go/crispasr/run.sh +++ b/backend/go/crispasr/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgocrispasr-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgocrispasr-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgocrispasr-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgocrispasr-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgocrispasr-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export CRISPASR_LIBRARY=$LIBRARY # Point piper's espeak-ng phonemizer at the bundled voice data. The variable diff --git a/backend/go/depth-anything-cpp/Makefile b/backend/go/depth-anything-cpp/Makefile index 815d2b0db..e142607ab 100644 --- a/backend/go/depth-anything-cpp/Makefile +++ b/backend/go/depth-anything-cpp/Makefile @@ -8,11 +8,13 @@ JOBS?=$(shell nproc --ignore=1) # depth-anything.cpp. Pin to a specific commit for a stable build; a squash # merge upstream can orphan a branch, so the native version is pinned by SHA. -# This SHA adds the nested two-file metric C-API (abi_version 4, -# da_capi_load_nested) required by the depth-anything-3-nested gallery model; -# tag it (e.g. v0.1.3) upstream to keep the SHA alive. +# This SHA adds the Depth Anything V2 engine + C-API routing (depth-only, +# relative + metric) on top of the nested two-file metric C-API (abi_version 4, +# da_capi_load_nested) required by the depth-anything-3-nested gallery model. +# It is kept alive by the upstream tag da2-support (survives a squash-merge); +# repoint to the master merge commit once mudler/depth-anything.cpp PR #1 lands. DEPTHANYTHING_REPO?=https://github.com/mudler/depth-anything.cpp.git -DEPTHANYTHING_VERSION?=cce5edc395fd1843806093d7ccc0c8b0d0b97b72 +DEPTHANYTHING_VERSION?=f4e17dea695dd12ae76bea98ba58030996b98118 ifeq ($(NATIVE),false) CMAKE_ARGS+=-DGGML_NATIVE=OFF @@ -38,6 +40,8 @@ else ifeq ($(BUILD_TYPE),hipblas) else ifeq ($(BUILD_TYPE),vulkan) CMAKE_ARGS+=-DGGML_VULKAN=ON -DDA_GGML_VULKAN=ON else ifeq ($(OS),Darwin) + # macOS/Metal: built + published as an OCI image by CI (includeDarwin in + # .github/backend-matrix.yml) so Apple Silicon users can install this backend. ifneq ($(BUILD_TYPE),metal) CMAKE_ARGS+=-DGGML_METAL=OFF else @@ -75,7 +79,7 @@ ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libdepthanythingcpp-avx.so libdepthanythingcpp-avx2.so libdepthanythingcpp-avx512.so libdepthanythingcpp-fallback.so else # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = libdepthanythingcpp-fallback.so + VARIANT_TARGETS = libdepthanythingcpp-fallback.dylib endif depth-anything-cpp: main.go godepthanythingcpp.go $(VARIANT_TARGETS) @@ -87,7 +91,7 @@ package: depth-anything-cpp build: package clean: purge - rm -rf libdepthanythingcpp*.so depth-anything-cpp package sources + rm -rf libdepthanythingcpp*.so libdepthanythingcpp*.dylib depth-anything-cpp package sources purge: rm -rf build* @@ -114,11 +118,19 @@ libdepthanythingcpp-avx512.so: sources/depth-anything.cpp endif # Build fallback variant (all platforms) +ifeq ($(UNAME_S),Darwin) +libdepthanythingcpp-fallback.dylib: sources/depth-anything.cpp + rm -rfv build-$@ + $(info ${GREEN}I depth-anything-cpp build info:fallback${RESET}) + SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom + rm -rfv build-$@ +else libdepthanythingcpp-fallback.so: sources/depth-anything.cpp rm -rfv build-$@ $(info ${GREEN}I depth-anything-cpp build info:fallback${RESET}) SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libdepthanythingcpp-custom rm -rfv build-$@ +endif libdepthanythingcpp-custom: CMakeLists.txt mkdir -p build-$(SO_TARGET) && \ @@ -126,7 +138,8 @@ libdepthanythingcpp-custom: CMakeLists.txt cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libdepthanything.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libdepthanything.dylib ./$(SO_TARGET) 2>/dev/null) all: depth-anything-cpp package diff --git a/backend/go/depth-anything-cpp/main.go b/backend/go/depth-anything-cpp/main.go index 4c4546797..cfad88b23 100644 --- a/backend/go/depth-anything-cpp/main.go +++ b/backend/go/depth-anything-cpp/main.go @@ -9,6 +9,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -27,7 +28,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("DEPTHANYTHING_LIBRARY") if libName == "" { - libName = "./libdepthanythingcpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libdepthanythingcpp-fallback.dylib" + } else { + libName = "./libdepthanythingcpp-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/depth-anything-cpp/package.sh b/backend/go/depth-anything-cpp/package.sh index 4690555ea..5bbd5559b 100755 --- a/backend/go/depth-anything-cpp/package.sh +++ b/backend/go/depth-anything-cpp/package.sh @@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.." # Create lib directory mkdir -p $CURDIR/package/lib -cp -avf $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ +cp -fv $CURDIR/libdepthanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libdepthanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -avf $CURDIR/depth-anything-cpp $CURDIR/package/ cp -fv $CURDIR/run.sh $CURDIR/package/ diff --git a/backend/go/depth-anything-cpp/run.sh b/backend/go/depth-anything-cpp/run.sh index 984aa5849..cbff6b0b5 100755 --- a/backend/go/depth-anything-cpp/run.sh +++ b/backend/go/depth-anything-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libdepthanythingcpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libdepthanythingcpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libdepthanythingcpp-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libdepthanythingcpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export DEPTHANYTHING_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/localvqe/Makefile b/backend/go/localvqe/Makefile index 7b66e9371..58b73c3b9 100644 --- a/backend/go/localvqe/Makefile +++ b/backend/go/localvqe/Makefile @@ -32,6 +32,8 @@ endif ifeq ($(BUILD_TYPE),vulkan) CMAKE_ARGS+=-DGGML_VULKAN=ON -DLOCALVQE_VULKAN=ON else ifeq ($(OS),Darwin) + # Apple Silicon: CPU-only (no Metal upstream); built + published as an arm64 + # image by CI (includeDarwin in .github/backend-matrix.yml) for macOS install. CMAKE_ARGS+=-DGGML_METAL=OFF endif @@ -67,8 +69,9 @@ $(LIB_SENTINEL): sources/LocalVQE # that the loader picks at runtime. We must build every target — the # default `--target localvqe_shared` drops these. CMAKE_LIBRARY_OUTPUT_DIRECTORY # routes all of them into build/bin; copy them out next to the binary. - cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.so* . + cp -P build/bin/liblocalvqe.so* . 2>/dev/null || cp -P build/bin/liblocalvqe.dylib . 2>/dev/null || cp -P build/liblocalvqe.so* . 2>/dev/null || cp -P build/liblocalvqe.dylib . cp -P build/bin/libggml*.so* . 2>/dev/null || true + cp -P build/bin/libggml*.dylib . 2>/dev/null || true touch $(LIB_SENTINEL) liblocalvqe.so: $(LIB_SENTINEL) diff --git a/backend/go/localvqe/main.go b/backend/go/localvqe/main.go index 56ed2de2f..cbaa2a134 100644 --- a/backend/go/localvqe/main.go +++ b/backend/go/localvqe/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("LOCALVQE_LIBRARY") if libName == "" { - libName = "./liblocalvqe.so" + if runtime.GOOS == "darwin" { + libName = "./liblocalvqe.dylib" + } else { + libName = "./liblocalvqe.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/localvqe/package.sh b/backend/go/localvqe/package.sh index ca8dfd3ab..9f9f2533d 100755 --- a/backend/go/localvqe/package.sh +++ b/backend/go/localvqe/package.sh @@ -15,7 +15,9 @@ cp -avf $CURDIR/localvqe $CURDIR/package/ # liblocalvqe.so* (with SOVERSION symlinks) and the libggml-*.so runtime # variants — LocalVQE picks the matching CPU variant at load time. cp -P $CURDIR/liblocalvqe.so* $CURDIR/package/ 2>/dev/null || true +cp -P $CURDIR/liblocalvqe.dylib $CURDIR/package/ 2>/dev/null || true cp -P $CURDIR/libggml*.so* $CURDIR/package/ 2>/dev/null || true +cp -P $CURDIR/libggml*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/localvqe/run.sh b/backend/go/localvqe/run.sh index 0f3192e31..d14d427c4 100755 --- a/backend/go/localvqe/run.sh +++ b/backend/go/localvqe/run.sh @@ -10,8 +10,19 @@ CURDIR=$(dirname "$(realpath $0)") # exec'ing the binary. cd "$CURDIR" -export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH -export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so +if [ "$(uname)" = "Darwin" ]; then + # macOS: LocalVQE is built as a SHARED library, so dyld needs the .dylib + + # DYLD_LIBRARY_PATH. Prefer .dylib and fall back to .so just in case. + export DYLD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$DYLD_LIBRARY_PATH + LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.dylib + if [ ! -e "$LOCALVQE_LIBRARY" ]; then + LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so + fi + export LOCALVQE_LIBRARY +else + export LD_LIBRARY_PATH=$CURDIR:$CURDIR/lib:$LD_LIBRARY_PATH + export LOCALVQE_LIBRARY=$CURDIR/liblocalvqe.so +fi if [ -f $CURDIR/lib/ld.so ]; then echo "Using lib/ld.so" diff --git a/backend/go/locate-anything-cpp/Makefile b/backend/go/locate-anything-cpp/Makefile index 91dbc41c2..c66d57764 100644 --- a/backend/go/locate-anything-cpp/Makefile +++ b/backend/go/locate-anything-cpp/Makefile @@ -33,6 +33,8 @@ else ifeq ($(BUILD_TYPE),hipblas) else ifeq ($(BUILD_TYPE),vulkan) CMAKE_ARGS+=-DGGML_VULKAN=ON -DLA_GGML_VULKAN=ON else ifeq ($(OS),Darwin) + # macOS/Metal: built + published as an OCI image by CI (includeDarwin in + # .github/backend-matrix.yml) so Apple Silicon users can install this backend. ifneq ($(BUILD_TYPE),metal) CMAKE_ARGS+=-DGGML_METAL=OFF else @@ -70,7 +72,7 @@ ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = liblocateanythingcpp-avx.so liblocateanythingcpp-avx2.so liblocateanythingcpp-avx512.so liblocateanythingcpp-fallback.so else # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = liblocateanythingcpp-fallback.so + VARIANT_TARGETS = liblocateanythingcpp-fallback.dylib endif locate-anything-cpp: main.go golocateanythingcpp.go $(VARIANT_TARGETS) @@ -82,7 +84,7 @@ package: locate-anything-cpp build: package clean: purge - rm -rf liblocateanythingcpp*.so locate-anything-cpp package sources + rm -rf liblocateanythingcpp*.so liblocateanythingcpp*.dylib locate-anything-cpp package sources purge: rm -rf build* @@ -109,11 +111,19 @@ liblocateanythingcpp-avx512.so: sources/locate-anything.cpp endif # Build fallback variant (all platforms) +ifeq ($(UNAME_S),Darwin) +liblocateanythingcpp-fallback.dylib: sources/locate-anything.cpp + rm -rfv build-$@ + $(info ${GREEN}I locate-anything-cpp build info:fallback${RESET}) + SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom + rm -rfv build-$@ +else liblocateanythingcpp-fallback.so: sources/locate-anything.cpp rm -rfv build-$@ $(info ${GREEN}I locate-anything-cpp build info:fallback${RESET}) SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) liblocateanythingcpp-custom rm -rfv build-$@ +endif liblocateanythingcpp-custom: CMakeLists.txt mkdir -p build-$(SO_TARGET) && \ @@ -121,7 +131,8 @@ liblocateanythingcpp-custom: CMakeLists.txt cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/liblocateanythingcpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/liblocateanythingcpp.dylib ./$(SO_TARGET) 2>/dev/null) all: locate-anything-cpp package diff --git a/backend/go/locate-anything-cpp/main.go b/backend/go/locate-anything-cpp/main.go index 91ccaf38e..77e53bb95 100644 --- a/backend/go/locate-anything-cpp/main.go +++ b/backend/go/locate-anything-cpp/main.go @@ -9,6 +9,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -27,7 +28,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("LOCATEANYTHING_LIBRARY") if libName == "" { - libName = "./liblocateanythingcpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./liblocateanythingcpp-fallback.dylib" + } else { + libName = "./liblocateanythingcpp-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/locate-anything-cpp/package.sh b/backend/go/locate-anything-cpp/package.sh index 3b1f13428..1e6cbee80 100755 --- a/backend/go/locate-anything-cpp/package.sh +++ b/backend/go/locate-anything-cpp/package.sh @@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.." # Create lib directory mkdir -p $CURDIR/package/lib -cp -avf $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ +cp -fv $CURDIR/liblocateanythingcpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/liblocateanythingcpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -avf $CURDIR/locate-anything-cpp $CURDIR/package/ cp -fv $CURDIR/run.sh $CURDIR/package/ diff --git a/backend/go/locate-anything-cpp/run.sh b/backend/go/locate-anything-cpp/run.sh index cefbff629..4eebb3c63 100755 --- a/backend/go/locate-anything-cpp/run.sh +++ b/backend/go/locate-anything-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/liblocateanythingcpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/liblocateanythingcpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/liblocateanythingcpp-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/liblocateanythingcpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export LOCATEANYTHING_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/omnivoice-cpp/Makefile b/backend/go/omnivoice-cpp/Makefile index 7806ce11f..36b447b13 100644 --- a/backend/go/omnivoice-cpp/Makefile +++ b/backend/go/omnivoice-cpp/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # omnivoice.cpp version OMNIVOICE_REPO?=https://github.com/ServeurpersoCom/omnivoice.cpp -OMNIVOICE_VERSION?=2603355a5dfacae5cfc33531d5d0933221843509 +OMNIVOICE_VERSION?=0f37401bebe9b20c0160a888e592108fc1d17607 SO_TARGET?=libgomnivoicecpp.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF @@ -65,7 +65,8 @@ UNAME_S := $(shell uname -s) ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libgomnivoicecpp-avx.so libgomnivoicecpp-avx2.so libgomnivoicecpp-avx512.so libgomnivoicecpp-fallback.so else - VARIANT_TARGETS = libgomnivoicecpp-fallback.so + # On non-Linux (e.g., Darwin), build only fallback variant (as a dylib) + VARIANT_TARGETS = libgomnivoicecpp-fallback.dylib endif omnivoice-cpp: main.go gomnivoicecpp.go $(VARIANT_TARGETS) @@ -77,7 +78,7 @@ package: omnivoice-cpp build: package clean: purge - rm -rf libgomnivoicecpp*.so package sources/omnivoice.cpp omnivoice-cpp + rm -rf libgomnivoicecpp*.so libgomnivoicecpp*.dylib package sources/omnivoice.cpp omnivoice-cpp purge: rm -rf build* @@ -106,13 +107,20 @@ libgomnivoicecpp-fallback.so: sources/omnivoice.cpp SO_TARGET=libgomnivoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom rm -rf build-libgomnivoicecpp-fallback.so +# Build fallback variant as a dylib (Darwin) +libgomnivoicecpp-fallback.dylib: sources/omnivoice.cpp + $(info ${GREEN}I omnivoice-cpp build info:fallback (dylib)${RESET}) + SO_TARGET=libgomnivoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgomnivoicecpp-custom + rm -rf build-libgomnivoicecpp-fallback.dylib + libgomnivoicecpp-custom: CMakeLists.txt cpp/gomnivoicecpp.cpp cpp/gomnivoicecpp.h mkdir -p build-$(SO_TARGET) && \ cd build-$(SO_TARGET) && \ cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) --target gomnivoicecpp && \ cd .. && \ - mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgomnivoicecpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgomnivoicecpp.dylib ./$(SO_TARGET) 2>/dev/null) test: omnivoice-cpp @echo "Running omnivoice-cpp tests..." diff --git a/backend/go/omnivoice-cpp/main.go b/backend/go/omnivoice-cpp/main.go index 891201f49..f44eb31a7 100644 --- a/backend/go/omnivoice-cpp/main.go +++ b/backend/go/omnivoice-cpp/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("OMNIVOICE_LIBRARY") if libName == "" { - libName = "./libgomnivoicecpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgomnivoicecpp-fallback.dylib" + } else { + libName = "./libgomnivoicecpp-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/omnivoice-cpp/package.sh b/backend/go/omnivoice-cpp/package.sh index b8313d9d7..97a8d7809 100755 --- a/backend/go/omnivoice-cpp/package.sh +++ b/backend/go/omnivoice-cpp/package.sh @@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/omnivoice-cpp $CURDIR/package/ -cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ +cp -fv $CURDIR/libgomnivoicecpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgomnivoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/omnivoice-cpp/run.sh b/backend/go/omnivoice-cpp/run.sh index f677ca21c..81ea2b719 100755 --- a/backend/go/omnivoice-cpp/run.sh +++ b/backend/go/omnivoice-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgomnivoicecpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgomnivoicecpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgomnivoicecpp-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgomnivoicecpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export OMNIVOICE_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/parakeet-cpp/Makefile b/backend/go/parakeet-cpp/Makefile index 2ea86a0c6..7fc46f8e2 100644 --- a/backend/go/parakeet-cpp/Makefile +++ b/backend/go/parakeet-cpp/Makefile @@ -1,6 +1,6 @@ # parakeet-cpp backend Makefile. # -# Upstream pin lives below as PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45 +# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a # (.github/bump_deps.sh) can find and update it - matches the # whisper.cpp / ds4 / vibevoice-cpp convention. # @@ -15,7 +15,7 @@ # That's what the L0 smoke test uses. The default target below does the # proper clone-at-pin + cmake build so CI doesn't need a side-checkout. -PARAKEET_VERSION?=92a5f0306be354c109150fe58ae4cc4f8a21ca45 +PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp GOCMD?=go @@ -74,6 +74,7 @@ libparakeet.so: sources/parakeet.cpp cmake -B sources/parakeet.cpp/build-shared -S sources/parakeet.cpp $(CMAKE_ARGS) cmake --build sources/parakeet.cpp/build-shared --config Release -j$(JOBS) cp -fv sources/parakeet.cpp/build-shared/libparakeet.so* ./ 2>/dev/null || true + cp -fv sources/parakeet.cpp/build-shared/libparakeet.dylib ./ 2>/dev/null || true cp -fv sources/parakeet.cpp/include/parakeet_capi.h ./ parakeet-cpp-grpc: libparakeet.so main.go goparakeetcpp.go diff --git a/backend/go/parakeet-cpp/main.go b/backend/go/parakeet-cpp/main.go index 963056e23..9c6466b13 100644 --- a/backend/go/parakeet-cpp/main.go +++ b/backend/go/parakeet-cpp/main.go @@ -2,15 +2,17 @@ package main // Started internally by LocalAI - one gRPC server per loaded model. // -// Loads libparakeet.so via purego and registers the flat C-API entry -// points declared in parakeet_capi.h. The library name can be overridden -// with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / VIBEVOICECPP_LIBRARY -// convention in the sibling backends); the default looks for the .so next -// to this binary. +// Loads the parakeet shared library via purego and registers the flat +// C-API entry points declared in parakeet_capi.h. The library name can be +// overridden with PARAKEET_LIBRARY (mirrors the WHISPER_LIBRARY / +// VIBEVOICECPP_LIBRARY convention in the sibling backends); the default +// looks next to this binary for libparakeet.so on Linux and +// libparakeet.dylib on macOS. import ( "flag" "fmt" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -28,7 +30,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("PARAKEET_LIBRARY") if libName == "" { - libName = "libparakeet.so" + if runtime.GOOS == "darwin" { + libName = "libparakeet.dylib" + } else { + libName = "libparakeet.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/parakeet-cpp/package.sh b/backend/go/parakeet-cpp/package.sh index 7af2d7b59..af8e6b9e1 100755 --- a/backend/go/parakeet-cpp/package.sh +++ b/backend/go/parakeet-cpp/package.sh @@ -1,23 +1,71 @@ #!/bin/bash # -# L0 packaging stub: copy the binary, run.sh and libparakeet.so* into -# package/. The full ldd walk (libc, libstdc++, libgomp, GPU runtimes, -# arch detection) lands in L3, mirroring backend/go/whisper/package.sh. +# Bundle the parakeet-cpp-grpc binary, libparakeet.so, the core runtime +# libs (libc/libstdc++/libgomp + ld.so) and the GPU runtime for the active +# BUILD_TYPE so the package is self-contained. Mirrors +# backend/go/whisper/package.sh; run.sh routes the (CGO_ENABLED=0) binary +# through lib/ld.so so the packaged libc is used instead of the host's. set -e CURDIR=$(dirname "$(realpath "$0")") +REPO_ROOT="${CURDIR}/../../.." mkdir -p "$CURDIR/package/lib" cp -avf "$CURDIR/parakeet-cpp-grpc" "$CURDIR/package/" cp -avf "$CURDIR/run.sh" "$CURDIR/package/" -# libparakeet.so + any soname symlinks (libparakeet.so.X, libparakeet.so.X.Y). -cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || { - echo "ERROR: libparakeet.so not found in $CURDIR, run 'make' first" >&2 +# libparakeet shared lib + any soname symlinks. On Linux this is +# libparakeet.so[.X.Y]; on macOS it is libparakeet.dylib. purego.Dlopen +# resolves it via the *_LIBRARY_PATH that run.sh points at lib/. +cp -avf "$CURDIR"/libparakeet.so* "$CURDIR/package/lib/" 2>/dev/null || true +cp -avf "$CURDIR"/libparakeet.dylib "$CURDIR/package/lib/" 2>/dev/null || true +if ! ls "$CURDIR"/package/lib/libparakeet.* >/dev/null 2>&1; then + echo "ERROR: libparakeet shared library not found in $CURDIR, run 'make' first" >&2 exit 1 -} +fi -echo "L0 package layout (full ldd walk lands in L3):" +# Detect architecture and copy the core runtime libs libparakeet.so links +# against, plus the matching dynamic loader as lib/ld.so. +if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then + echo "Detected x86_64 architecture, copying x86_64 libraries..." + cp -arfLv /lib64/ld-linux-x86-64.so.2 "$CURDIR/package/lib/ld.so" + cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6" + cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2" + cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1" + cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0" +elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then + echo "Detected ARM64 architecture, copying ARM64 libraries..." + cp -arfLv /lib/ld-linux-aarch64.so.1 "$CURDIR/package/lib/ld.so" + cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 "$CURDIR/package/lib/libc.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 "$CURDIR/package/lib/libgcc_s.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 "$CURDIR/package/lib/libstdc++.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 "$CURDIR/package/lib/libm.so.6" + cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 "$CURDIR/package/lib/libgomp.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 "$CURDIR/package/lib/libdl.so.2" + cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 "$CURDIR/package/lib/librt.so.1" + cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 "$CURDIR/package/lib/libpthread.so.0" +elif [ "$(uname -s)" = "Darwin" ]; then + echo "Detected Darwin — system frameworks linked dynamically, no bundled libs needed" +else + echo "Error: Could not detect architecture" + exit 1 +fi + +# Package GPU libraries (CUDA/ROCm/Intel/Vulkan loader + ICDs + drivers) +# based on BUILD_TYPE so the backend can reach the GPU without the runtime +# base image shipping those drivers. +GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh" +if [ -f "$GPU_LIB_SCRIPT" ]; then + echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..." + source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib" + package_gpu_libs +fi + +echo "Packaging completed successfully" ls -liah "$CURDIR/package/" "$CURDIR/package/lib/" diff --git a/backend/go/parakeet-cpp/run.sh b/backend/go/parakeet-cpp/run.sh index 6f371d4f0..be859f381 100755 --- a/backend/go/parakeet-cpp/run.sh +++ b/backend/go/parakeet-cpp/run.sh @@ -3,11 +3,17 @@ set -e CURDIR=$(dirname "$(realpath "$0")") -export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}" +if [ "$(uname)" = "Darwin" ]; then + export DYLD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${DYLD_LIBRARY_PATH:-}" + export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.dylib" +else + export LD_LIBRARY_PATH="$CURDIR/lib:$CURDIR:${LD_LIBRARY_PATH:-}" + export PARAKEET_LIBRARY="$CURDIR/lib/libparakeet.so" +fi # If a self-contained ld.so was packaged, route through it so the # packaged libc / libstdc++ are used instead of the host's (matches the -# whisper backend's runtime layout). +# whisper backend's runtime layout). Linux only. if [ -f "$CURDIR/lib/ld.so" ]; then echo "Using lib/ld.so" exec "$CURDIR/lib/ld.so" "$CURDIR/parakeet-cpp-grpc" "$@" diff --git a/backend/go/qwen3-tts-cpp/Makefile b/backend/go/qwen3-tts-cpp/Makefile index e5f6a838f..c2bc6de34 100644 --- a/backend/go/qwen3-tts-cpp/Makefile +++ b/backend/go/qwen3-tts-cpp/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # qwentts.cpp version QWEN3TTS_REPO?=https://github.com/ServeurpersoCom/qwentts.cpp -QWEN3TTS_CPP_VERSION?=0bf4a18b22e8bb8718d95294e9f7f45c0d4270a4 +QWEN3TTS_CPP_VERSION?=9dbe7ea26a01b30fccb117ae5e86807c1dc23d42 SO_TARGET?=libgoqwen3ttscpp.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF @@ -65,8 +65,8 @@ UNAME_S := $(shell uname -s) ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libgoqwen3ttscpp-avx.so libgoqwen3ttscpp-avx2.so libgoqwen3ttscpp-avx512.so libgoqwen3ttscpp-fallback.so else - # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = libgoqwen3ttscpp-fallback.so + # On non-Linux (e.g., Darwin), build only fallback variant (as a dylib) + VARIANT_TARGETS = libgoqwen3ttscpp-fallback.dylib endif qwen3-tts-cpp: main.go goqwen3ttscpp.go $(VARIANT_TARGETS) @@ -78,7 +78,7 @@ package: qwen3-tts-cpp build: package clean: purge - rm -rf libgoqwen3ttscpp*.so package sources/qwentts.cpp qwen3-tts-cpp + rm -rf libgoqwen3ttscpp*.so libgoqwen3ttscpp*.dylib package sources/qwentts.cpp qwen3-tts-cpp purge: rm -rf build* @@ -110,13 +110,20 @@ libgoqwen3ttscpp-fallback.so: sources/qwentts.cpp SO_TARGET=libgoqwen3ttscpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom rm -rf build-libgoqwen3ttscpp-fallback.so +# Build fallback variant as a dylib (Darwin) +libgoqwen3ttscpp-fallback.dylib: sources/qwentts.cpp + $(info ${GREEN}I qwen3-tts-cpp build info:fallback (dylib)${RESET}) + SO_TARGET=libgoqwen3ttscpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgoqwen3ttscpp-custom + rm -rf build-libgoqwen3ttscpp-fallback.dylib + libgoqwen3ttscpp-custom: CMakeLists.txt cpp/goqwen3ttscpp.cpp cpp/goqwen3ttscpp.h mkdir -p build-$(SO_TARGET) && \ cd build-$(SO_TARGET) && \ cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) --target goqwen3ttscpp && \ cd .. && \ - mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgoqwen3ttscpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgoqwen3ttscpp.dylib ./$(SO_TARGET) 2>/dev/null) test: qwen3-tts-cpp @echo "Running qwen3-tts-cpp tests..." diff --git a/backend/go/qwen3-tts-cpp/main.go b/backend/go/qwen3-tts-cpp/main.go index b788229cd..041a23ad0 100644 --- a/backend/go/qwen3-tts-cpp/main.go +++ b/backend/go/qwen3-tts-cpp/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("QWEN3TTS_LIBRARY") if libName == "" { - libName = "./libgoqwen3ttscpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgoqwen3ttscpp-fallback.dylib" + } else { + libName = "./libgoqwen3ttscpp-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/qwen3-tts-cpp/package.sh b/backend/go/qwen3-tts-cpp/package.sh index bb73df968..11d4c57c3 100755 --- a/backend/go/qwen3-tts-cpp/package.sh +++ b/backend/go/qwen3-tts-cpp/package.sh @@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/qwen3-tts-cpp $CURDIR/package/ -cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ +cp -fv $CURDIR/libgoqwen3ttscpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgoqwen3ttscpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/qwen3-tts-cpp/run.sh b/backend/go/qwen3-tts-cpp/run.sh index 6416779fa..638cf9661 100755 --- a/backend/go/qwen3-tts-cpp/run.sh +++ b/backend/go/qwen3-tts-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgoqwen3ttscpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgoqwen3ttscpp-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgoqwen3ttscpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export QWEN3TTS_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/rfdetr-cpp/Makefile b/backend/go/rfdetr-cpp/Makefile index 7c598f732..448a8e78b 100644 --- a/backend/go/rfdetr-cpp/Makefile +++ b/backend/go/rfdetr-cpp/Makefile @@ -34,6 +34,8 @@ else ifeq ($(BUILD_TYPE),hipblas) else ifeq ($(BUILD_TYPE),vulkan) CMAKE_ARGS+=-DGGML_VULKAN=ON -DRFDETR_GGML_VULKAN=ON else ifeq ($(OS),Darwin) + # macOS/Metal: built + published as an OCI image by CI (includeDarwin in + # .github/backend-matrix.yml) so Apple Silicon users can install this backend. ifneq ($(BUILD_TYPE),metal) CMAKE_ARGS+=-DGGML_METAL=OFF else @@ -71,7 +73,7 @@ ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = librfdetrcpp-avx.so librfdetrcpp-avx2.so librfdetrcpp-avx512.so librfdetrcpp-fallback.so else # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = librfdetrcpp-fallback.so + VARIANT_TARGETS = librfdetrcpp-fallback.dylib endif rfdetr-cpp: main.go gorfdetrcpp.go $(VARIANT_TARGETS) @@ -83,7 +85,7 @@ package: rfdetr-cpp build: package clean: purge - rm -rf librfdetrcpp*.so rfdetr-cpp package sources + rm -rf librfdetrcpp*.so librfdetrcpp*.dylib rfdetr-cpp package sources purge: rm -rf build* @@ -110,11 +112,19 @@ librfdetrcpp-avx512.so: sources/rt-detr.cpp endif # Build fallback variant (all platforms) +ifeq ($(UNAME_S),Darwin) +librfdetrcpp-fallback.dylib: sources/rt-detr.cpp + rm -rfv build-$@ + $(info ${GREEN}I rfdetr-cpp build info:fallback${RESET}) + SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom + rm -rfv build-$@ +else librfdetrcpp-fallback.so: sources/rt-detr.cpp rm -rfv build-$@ $(info ${GREEN}I rfdetr-cpp build info:fallback${RESET}) SO_TARGET=$@ CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) librfdetrcpp-custom rm -rfv build-$@ +endif librfdetrcpp-custom: CMakeLists.txt mkdir -p build-$(SO_TARGET) && \ @@ -122,7 +132,8 @@ librfdetrcpp-custom: CMakeLists.txt cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/librfdetrcpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/librfdetrcpp.dylib ./$(SO_TARGET) 2>/dev/null) all: rfdetr-cpp package diff --git a/backend/go/rfdetr-cpp/main.go b/backend/go/rfdetr-cpp/main.go index 3c95df1c2..58637122a 100644 --- a/backend/go/rfdetr-cpp/main.go +++ b/backend/go/rfdetr-cpp/main.go @@ -9,6 +9,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -27,7 +28,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("RFDETR_LIBRARY") if libName == "" { - libName = "./librfdetrcpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./librfdetrcpp-fallback.dylib" + } else { + libName = "./librfdetrcpp-fallback.so" + } } rfdetrLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/rfdetr-cpp/package.sh b/backend/go/rfdetr-cpp/package.sh index 9591b79dc..17319bf27 100755 --- a/backend/go/rfdetr-cpp/package.sh +++ b/backend/go/rfdetr-cpp/package.sh @@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.." # Create lib directory mkdir -p $CURDIR/package/lib -cp -avf $CURDIR/librfdetrcpp-*.so $CURDIR/package/ +cp -fv $CURDIR/librfdetrcpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/librfdetrcpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -avf $CURDIR/rfdetr-cpp $CURDIR/package/ cp -fv $CURDIR/run.sh $CURDIR/package/ diff --git a/backend/go/rfdetr-cpp/run.sh b/backend/go/rfdetr-cpp/run.sh index 042904e45..ffbd604dd 100755 --- a/backend/go/rfdetr-cpp/run.sh +++ b/backend/go/rfdetr-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/librfdetrcpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/librfdetrcpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/librfdetrcpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/librfdetrcpp-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/librfdetrcpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export RFDETR_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/sam3-cpp/Makefile b/backend/go/sam3-cpp/Makefile index 53b0dfb5e..f91bb356a 100644 --- a/backend/go/sam3-cpp/Makefile +++ b/backend/go/sam3-cpp/Makefile @@ -31,6 +31,8 @@ else ifeq ($(BUILD_TYPE),hipblas) else ifeq ($(BUILD_TYPE),vulkan) CMAKE_ARGS+=-DGGML_VULKAN=ON else ifeq ($(OS),Darwin) + # macOS/Metal: built + published as an OCI image by CI (includeDarwin in + # .github/backend-matrix.yml) so Apple Silicon users can install this backend. ifneq ($(BUILD_TYPE),metal) CMAKE_ARGS+=-DGGML_METAL=OFF else @@ -66,7 +68,7 @@ ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libgosam3-avx.so libgosam3-avx2.so libgosam3-avx512.so libgosam3-fallback.so else # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = libgosam3-fallback.so + VARIANT_TARGETS = libgosam3-fallback.dylib endif sam3-cpp: main.go gosam3.go $(VARIANT_TARGETS) @@ -78,7 +80,7 @@ package: sam3-cpp build: package clean: purge - rm -rf libgosam3*.so sam3-cpp package sources + rm -rf libgosam3*.so libgosam3*.dylib sam3-cpp package sources purge: rm -rf build* @@ -105,11 +107,19 @@ libgosam3-avx512.so: sources/sam3.cpp endif # Build fallback variant (all platforms) +ifeq ($(UNAME_S),Darwin) +libgosam3-fallback.dylib: sources/sam3.cpp + $(MAKE) purge + $(info ${GREEN}I sam3-cpp build info:fallback${RESET}) + SO_TARGET=libgosam3-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom + rm -rfv build* +else libgosam3-fallback.so: sources/sam3.cpp $(MAKE) purge $(info ${GREEN}I sam3-cpp build info:fallback${RESET}) SO_TARGET=libgosam3-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgosam3-custom rm -rfv build* +endif libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h mkdir -p build-$(SO_TARGET) && \ @@ -117,6 +127,7 @@ libgosam3-custom: CMakeLists.txt cpp/gosam3.cpp cpp/gosam3.h cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgosam3.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgosam3.dylib ./$(SO_TARGET) 2>/dev/null) all: sam3-cpp package diff --git a/backend/go/sam3-cpp/main.go b/backend/go/sam3-cpp/main.go index c83a59285..e36849f69 100644 --- a/backend/go/sam3-cpp/main.go +++ b/backend/go/sam3-cpp/main.go @@ -3,6 +3,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("SAM3_LIBRARY") if libName == "" { - libName = "./libgosam3-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgosam3-fallback.dylib" + } else { + libName = "./libgosam3-fallback.so" + } } gosamLib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/sam3-cpp/package.sh b/backend/go/sam3-cpp/package.sh index 254aef286..a648ee93c 100755 --- a/backend/go/sam3-cpp/package.sh +++ b/backend/go/sam3-cpp/package.sh @@ -10,7 +10,8 @@ REPO_ROOT="${CURDIR}/../../.." # Create lib directory mkdir -p $CURDIR/package/lib -cp -avf $CURDIR/libgosam3-*.so $CURDIR/package/ +cp -fv $CURDIR/libgosam3-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgosam3-*.dylib $CURDIR/package/ 2>/dev/null || true cp -avf $CURDIR/sam3-cpp $CURDIR/package/ cp -fv $CURDIR/run.sh $CURDIR/package/ diff --git a/backend/go/sam3-cpp/run.sh b/backend/go/sam3-cpp/run.sh index 423ed9199..7bff52df6 100755 --- a/backend/go/sam3-cpp/run.sh +++ b/backend/go/sam3-cpp/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgosam3-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgosam3-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgosam3-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgosam3-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgosam3-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export SAM3_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/sherpa-onnx/backend.go b/backend/go/sherpa-onnx/backend.go index 0a092acf7..8bfe5e75c 100644 --- a/backend/go/sherpa-onnx/backend.go +++ b/backend/go/sherpa-onnx/backend.go @@ -7,6 +7,7 @@ import ( "fmt" "os" "path/filepath" + "runtime" "strconv" "strings" "sync" @@ -238,11 +239,19 @@ func loadSherpaLibs() error { func loadSherpaLibsOnce() error { shimLib := os.Getenv("SHERPA_SHIM_LIBRARY") if shimLib == "" { - shimLib = "libsherpa-shim.so" + if runtime.GOOS == "darwin" { + shimLib = "libsherpa-shim.dylib" + } else { + shimLib = "libsherpa-shim.so" + } } capiLib := os.Getenv("SHERPA_ONNX_LIBRARY") if capiLib == "" { - capiLib = "libsherpa-onnx-c-api.so" + if runtime.GOOS == "darwin" { + capiLib = "libsherpa-onnx-c-api.dylib" + } else { + capiLib = "libsherpa-onnx-c-api.so" + } } shim, err := purego.Dlopen(shimLib, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/sherpa-onnx/run.sh b/backend/go/sherpa-onnx/run.sh index b703e5155..771324326 100755 --- a/backend/go/sherpa-onnx/run.sh +++ b/backend/go/sherpa-onnx/run.sh @@ -3,7 +3,13 @@ set -ex CURDIR=$(dirname "$(realpath $0)") -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH +if [ "$(uname)" = "Darwin" ]; then + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH + export SHERPA_SHIM_LIBRARY=$CURDIR/lib/libsherpa-shim.dylib + export SHERPA_ONNX_LIBRARY=$CURDIR/lib/libsherpa-onnx-c-api.dylib +else + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH +fi if [ -f $CURDIR/lib/ld.so ]; then echo "Using lib/ld.so" diff --git a/backend/go/stablediffusion-ggml/Makefile b/backend/go/stablediffusion-ggml/Makefile index d6d03adab..7a9917ea8 100644 --- a/backend/go/stablediffusion-ggml/Makefile +++ b/backend/go/stablediffusion-ggml/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # stablediffusion.cpp (ggml) STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp -STABLEDIFFUSION_GGML_VERSION?=7f0e728b7d42f2490dfa5dd9539082d904f2f6b2 +STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f CMAKE_ARGS+=-DGGML_MAX_NAME=128 @@ -131,6 +131,7 @@ libgosd-custom: CMakeLists.txt cpp/gosd.cpp cpp/gosd.h cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgosd.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgosd.dylib ./$(SO_TARGET) 2>/dev/null) all: stablediffusion-ggml package \ No newline at end of file diff --git a/backend/go/stablediffusion-ggml/main.go b/backend/go/stablediffusion-ggml/main.go index 998f2a5ab..b509c6a2b 100644 --- a/backend/go/stablediffusion-ggml/main.go +++ b/backend/go/stablediffusion-ggml/main.go @@ -3,6 +3,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("SD_LIBRARY") if libName == "" { - libName = "./libgosd-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgosd-fallback.dylib" + } else { + libName = "./libgosd-fallback.so" + } } gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/stablediffusion-ggml/package.sh b/backend/go/stablediffusion-ggml/package.sh index 8006e081f..922fb71ea 100755 --- a/backend/go/stablediffusion-ggml/package.sh +++ b/backend/go/stablediffusion-ggml/package.sh @@ -12,6 +12,7 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/libgosd-*.so $CURDIR/package/ +cp -fv $CURDIR/libgosd-*.dylib $CURDIR/package/ 2>/dev/null || true cp -avf $CURDIR/stablediffusion-ggml $CURDIR/package/ cp -fv $CURDIR/run.sh $CURDIR/package/ diff --git a/backend/go/stablediffusion-ggml/run.sh b/backend/go/stablediffusion-ggml/run.sh index 71342e43b..e026b4b28 100755 --- a/backend/go/stablediffusion-ggml/run.sh +++ b/backend/go/stablediffusion-ggml/run.sh @@ -12,9 +12,18 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgosd-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single library variant (Metal or Accelerate). The gosd target is + # built as a CMake MODULE, which emits a .dylib for a SHARED build but a + # .so for a MODULE build on Apple, so prefer .dylib and fall back to .so. + LIBRARY="$CURDIR/libgosd-fallback.dylib" + if [ ! -e "$LIBRARY" ]; then + LIBRARY="$CURDIR/libgosd-fallback.so" + fi + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgosd-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgosd-avx.so ]; then @@ -36,9 +45,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgosd-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export SD_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/go/supertonic/helper.go b/backend/go/supertonic/helper.go index 9f927d5d3..884077e75 100644 --- a/backend/go/supertonic/helper.go +++ b/backend/go/supertonic/helper.go @@ -16,6 +16,7 @@ import ( "os" "path/filepath" "regexp" + "runtime" "strings" "time" "unicode" @@ -943,7 +944,13 @@ func InitializeONNXRuntime() error { } } if libPath == "" { - libPath = "/usr/local/lib/libonnxruntime.so" + // LocalAI: default to the platform-native shared library + // extension when nothing else is found (dyld vs ld.so). + if runtime.GOOS == "darwin" { + libPath = "/usr/local/lib/libonnxruntime.dylib" + } else { + libPath = "/usr/local/lib/libonnxruntime.so" + } } } ort.SetSharedLibraryPath(libPath) diff --git a/backend/go/supertonic/package.sh b/backend/go/supertonic/package.sh index 9e2a01625..678ca5ead 100755 --- a/backend/go/supertonic/package.sh +++ b/backend/go/supertonic/package.sh @@ -32,6 +32,10 @@ elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2 cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1 cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0 +elif [ $(uname -s) = "Darwin" ]; then + # macOS: dyld resolves the bundled .dylib via DYLD_LIBRARY_PATH (set in + # run.sh); there is no ld.so loader nor glibc to bundle. + echo "Detected Darwin" else echo "Error: Could not detect architecture" exit 1 diff --git a/backend/go/supertonic/run.sh b/backend/go/supertonic/run.sh index 2dabf7eb3..683c52ab2 100755 --- a/backend/go/supertonic/run.sh +++ b/backend/go/supertonic/run.sh @@ -3,12 +3,19 @@ set -ex CURDIR=$(dirname "$(realpath $0)") -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH -export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so +if [ "$(uname)" = "Darwin" ]; then + # macOS uses dyld: there is no ld.so loader, and the search path env + # var is DYLD_LIBRARY_PATH. ONNX Runtime ships as a .dylib here. + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH + export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.dylib +else + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH + export ONNXRUNTIME_LIB_PATH=$CURDIR/lib/libonnxruntime.so -if [ -f $CURDIR/lib/ld.so ]; then - echo "Using lib/ld.so" - exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@" + if [ -f $CURDIR/lib/ld.so ]; then + echo "Using lib/ld.so" + exec $CURDIR/lib/ld.so $CURDIR/supertonic "$@" + fi fi exec $CURDIR/supertonic "$@" diff --git a/backend/go/vibevoice-cpp/Makefile b/backend/go/vibevoice-cpp/Makefile index 199df9cc4..dc71eaa5d 100644 --- a/backend/go/vibevoice-cpp/Makefile +++ b/backend/go/vibevoice-cpp/Makefile @@ -70,8 +70,8 @@ UNAME_S := $(shell uname -s) ifeq ($(UNAME_S),Linux) VARIANT_TARGETS = libgovibevoicecpp-avx.so libgovibevoicecpp-avx2.so libgovibevoicecpp-avx512.so libgovibevoicecpp-fallback.so else - # On non-Linux (e.g., Darwin), build only fallback variant - VARIANT_TARGETS = libgovibevoicecpp-fallback.so + # On non-Linux (e.g., Darwin), build only fallback variant (as a dylib) + VARIANT_TARGETS = libgovibevoicecpp-fallback.dylib endif vibevoice-cpp: main.go govibevoicecpp.go $(VARIANT_TARGETS) @@ -83,7 +83,7 @@ package: vibevoice-cpp build: package clean: purge - rm -rf libgovibevoicecpp*.so package sources/vibevoice.cpp vibevoice-cpp + rm -rf libgovibevoicecpp*.so libgovibevoicecpp*.dylib package sources/vibevoice.cpp vibevoice-cpp purge: rm -rf build* @@ -119,13 +119,21 @@ libgovibevoicecpp-fallback.so: sources/vibevoice.cpp SO_TARGET=libgovibevoicecpp-fallback.so CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom rm -rfv build* +# Build fallback variant as a dylib (Darwin) +libgovibevoicecpp-fallback.dylib: sources/vibevoice.cpp + $(MAKE) purge + $(info ${GREEN}I vibevoice-cpp build info:fallback (dylib)${RESET}) + SO_TARGET=libgovibevoicecpp-fallback.dylib CMAKE_ARGS="$(CMAKE_ARGS) -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off" $(MAKE) libgovibevoicecpp-custom + rm -rfv build* + libgovibevoicecpp-custom: CMakeLists.txt cpp/govibevoicecpp.cpp cpp/govibevoicecpp.h mkdir -p build-$(SO_TARGET) && \ cd build-$(SO_TARGET) && \ cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) --target govibevoicecpp && \ cd .. && \ - mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) + (mv build-$(SO_TARGET)/libgovibevoicecpp.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgovibevoicecpp.dylib ./$(SO_TARGET) 2>/dev/null) test: vibevoice-cpp @echo "Running vibevoice-cpp tests..." diff --git a/backend/go/vibevoice-cpp/main.go b/backend/go/vibevoice-cpp/main.go index dd1f1ba43..b9a696d82 100644 --- a/backend/go/vibevoice-cpp/main.go +++ b/backend/go/vibevoice-cpp/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -21,7 +22,11 @@ type LibFuncs struct { func main() { libName := os.Getenv("VIBEVOICECPP_LIBRARY") if libName == "" { - libName = "./libgovibevoicecpp-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgovibevoicecpp-fallback.dylib" + } else { + libName = "./libgovibevoicecpp-fallback.so" + } } lib, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/vibevoice-cpp/package.sh b/backend/go/vibevoice-cpp/package.sh index 88010846f..62860b8d6 100755 --- a/backend/go/vibevoice-cpp/package.sh +++ b/backend/go/vibevoice-cpp/package.sh @@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/vibevoice-cpp $CURDIR/package/ -cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/ +cp -fv $CURDIR/libgovibevoicecpp-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgovibevoicecpp-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/vibevoice-cpp/run.sh b/backend/go/vibevoice-cpp/run.sh index 93e92d5b8..ec5a39c14 100755 --- a/backend/go/vibevoice-cpp/run.sh +++ b/backend/go/vibevoice-cpp/run.sh @@ -11,9 +11,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgovibevoicecpp-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgovibevoicecpp-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgovibevoicecpp-avx.so ]; then @@ -34,9 +38,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgovibevoicecpp-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export VIBEVOICECPP_LIBRARY=$LIBRARY if [ -f $CURDIR/lib/ld.so ]; then diff --git a/backend/go/whisper/Makefile b/backend/go/whisper/Makefile index e291e4d62..6dd13dd2c 100644 --- a/backend/go/whisper/Makefile +++ b/backend/go/whisper/Makefile @@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1) # whisper.cpp version WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp -WHISPER_CPP_VERSION?=86c40c3bd6fc86f1187fb751d111b49e0fc18e84 +WHISPER_CPP_VERSION?=43d78af5be58f41d6ffbc227d608f104577741ea SO_TARGET?=libgowhisper.so CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF @@ -117,6 +117,7 @@ libgowhisper-custom: CMakeLists.txt cpp/gowhisper.cpp cpp/gowhisper.h cmake .. $(CMAKE_ARGS) && \ cmake --build . --config Release -j$(JOBS) && \ cd .. && \ - mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET) + mv build-$(SO_TARGET)/libgowhisper.so ./$(SO_TARGET) 2>/dev/null || \ + mv build-$(SO_TARGET)/libgowhisper.dylib ./$(SO_TARGET:.so=.dylib) all: whisper package diff --git a/backend/go/whisper/main.go b/backend/go/whisper/main.go index e48b24519..ab102f4c4 100644 --- a/backend/go/whisper/main.go +++ b/backend/go/whisper/main.go @@ -4,6 +4,7 @@ package main import ( "flag" "os" + "runtime" "github.com/ebitengine/purego" grpc "github.com/mudler/LocalAI/pkg/grpc" @@ -22,7 +23,11 @@ func main() { // Get library name from environment variable, default to fallback libName := os.Getenv("WHISPER_LIBRARY") if libName == "" { - libName = "./libgowhisper-fallback.so" + if runtime.GOOS == "darwin" { + libName = "./libgowhisper-fallback.dylib" + } else { + libName = "./libgowhisper-fallback.so" + } } gosd, err := purego.Dlopen(libName, purego.RTLD_NOW|purego.RTLD_GLOBAL) diff --git a/backend/go/whisper/package.sh b/backend/go/whisper/package.sh index dfecdf5c6..efeaa7009 100755 --- a/backend/go/whisper/package.sh +++ b/backend/go/whisper/package.sh @@ -12,7 +12,8 @@ REPO_ROOT="${CURDIR}/../../.." mkdir -p $CURDIR/package/lib cp -avf $CURDIR/whisper $CURDIR/package/ -cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/ +cp -fv $CURDIR/libgowhisper-*.so $CURDIR/package/ 2>/dev/null || true +cp -fv $CURDIR/libgowhisper-*.dylib $CURDIR/package/ 2>/dev/null || true cp -fv $CURDIR/run.sh $CURDIR/package/ # Detect architecture and copy appropriate libraries diff --git a/backend/go/whisper/run.sh b/backend/go/whisper/run.sh index 1af2c0535..0e2bd7eb0 100755 --- a/backend/go/whisper/run.sh +++ b/backend/go/whisper/run.sh @@ -12,9 +12,13 @@ if [ "$(uname)" != "Darwin" ]; then grep -e "flags" /proc/cpuinfo | head -1 fi -LIBRARY="$CURDIR/libgowhisper-fallback.so" +if [ "$(uname)" = "Darwin" ]; then + # macOS: single dylib variant (Metal or Accelerate) + LIBRARY="$CURDIR/libgowhisper-fallback.dylib" + export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH +else + LIBRARY="$CURDIR/libgowhisper-fallback.so" -if [ "$(uname)" != "Darwin" ]; then if grep -q -e "\savx\s" /proc/cpuinfo ; then echo "CPU: AVX found OK" if [ -e $CURDIR/libgowhisper-avx.so ]; then @@ -36,9 +40,10 @@ if [ "$(uname)" != "Darwin" ]; then LIBRARY="$CURDIR/libgowhisper-avx512.so" fi fi + + export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH fi -export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH export WHISPER_LIBRARY=$LIBRARY # If there is a lib/ld.so, use it diff --git a/backend/index.yaml b/backend/index.yaml index 97fd1eb28..a7399e20d 100644 --- a/backend/index.yaml +++ b/backend/index.yaml @@ -178,6 +178,37 @@ nvidia-cuda-12: "cuda12-parakeet-cpp" nvidia-l4t-cuda-12: "nvidia-l4t-arm64-parakeet-cpp" nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-parakeet-cpp" +- &ced + name: "ced" + alias: "ced" + license: mit + icon: https://avatars.githubusercontent.com/u/95302084 + description: | + CED sound-event classification / audio tagging (527-class AudioSet). + ced.cpp is a C++/ggml port that performs audio tagging over the AudioSet + taxonomy, exposed through the SoundDetection gRPC rpc and the + /v1/audio/classification REST endpoint. It runs on CPU, NVIDIA CUDA, + AMD ROCm/HIP, Intel SYCL, Vulkan and NVIDIA Jetson (L4T) targets. + urls: + - https://github.com/mudler/ced.cpp + tags: + - audio-classification + - CPU + - GPU + - CUDA + - HIP + capabilities: + default: "cpu-ced" + nvidia: "cuda12-ced" + intel: "intel-sycl-f16-ced" + metal: "metal-ced" + amd: "rocm-ced" + vulkan: "vulkan-ced" + nvidia-l4t: "nvidia-l4t-arm64-ced" + nvidia-cuda-13: "cuda13-ced" + nvidia-cuda-12: "cuda12-ced" + nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced" + nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced" - &voxtral name: "voxtral" alias: "voxtral" @@ -309,6 +340,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp" intel: "intel-sycl-f32-sam3-cpp" vulkan: "vulkan-sam3-cpp" + metal: "metal-sam3-cpp" - &rfdetrcpp name: "rfdetr-cpp" alias: "rfdetr-cpp" @@ -337,6 +369,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp" intel: "intel-sycl-f32-rfdetr-cpp" vulkan: "vulkan-rfdetr-cpp" + metal: "metal-rfdetr-cpp" - &locateanything name: "locate-anything" alias: "locate-anything" @@ -366,6 +399,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp" intel: "intel-sycl-f32-locate-anything-cpp" vulkan: "vulkan-locate-anything-cpp" + metal: "metal-locate-anything-cpp" - !!merge <<: *locateanything name: "locate-anything-development" capabilities: @@ -378,6 +412,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-locate-anything-cpp-development" intel: "intel-sycl-f32-locate-anything-cpp-development" vulkan: "vulkan-locate-anything-cpp-development" + metal: "metal-locate-anything-cpp-development" - !!merge <<: *locateanything name: "cpu-locate-anything-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-locate-anything-cpp" @@ -388,6 +423,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-locate-anything-cpp" mirrors: - localai/localai-backends:master-cpu-locate-anything-cpp +- !!merge <<: *locateanything + name: "metal-locate-anything-cpp" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-locate-anything-cpp" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-locate-anything-cpp +- !!merge <<: *locateanything + name: "metal-locate-anything-cpp-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-locate-anything-cpp" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-locate-anything-cpp - !!merge <<: *locateanything name: "cuda12-locate-anything-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-locate-anything-cpp" @@ -486,6 +531,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp" intel: "intel-sycl-f32-depth-anything-cpp" vulkan: "vulkan-depth-anything-cpp" + metal: "metal-depth-anything-cpp" - !!merge <<: *depthanything name: "depth-anything-development" capabilities: @@ -498,6 +544,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-depth-anything-cpp-development" intel: "intel-sycl-f32-depth-anything-cpp-development" vulkan: "vulkan-depth-anything-cpp-development" + metal: "metal-depth-anything-cpp-development" - !!merge <<: *depthanything name: "cpu-depth-anything-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-depth-anything-cpp" @@ -508,6 +555,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-depth-anything-cpp" mirrors: - localai/localai-backends:master-cpu-depth-anything-cpp +- !!merge <<: *depthanything + name: "metal-depth-anything-cpp" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-depth-anything-cpp" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-depth-anything-cpp +- !!merge <<: *depthanything + name: "metal-depth-anything-cpp-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-depth-anything-cpp" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-depth-anything-cpp - !!merge <<: *depthanything name: "cuda12-depth-anything-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-depth-anything-cpp" @@ -614,6 +671,7 @@ nvidia-cuda-13: "cuda13-vllm" nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm" cpu: "cpu-vllm" + metal: "metal-vllm" - &sglang name: "sglang" license: apache-2.0 @@ -999,6 +1057,8 @@ nvidia-l4t: "vulkan-localvqe" nvidia-l4t-cuda-12: "vulkan-localvqe" nvidia-l4t-cuda-13: "vulkan-localvqe" + # Apple Silicon: CPU build (LocalVQE has no Metal path); still arm64-native. + metal: "metal-localvqe" - &privacyfilter name: "privacy-filter" alias: "privacy-filter" @@ -1253,6 +1313,7 @@ nvidia-cuda-13: "cuda13-liquid-audio" nvidia-cuda-12: "cuda12-liquid-audio" nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio" + metal: "metal-liquid-audio" icon: https://cdn-avatars.huggingface.co/v1/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png - &qwen-tts urls: @@ -1538,6 +1599,7 @@ - TTS capabilities: default: "cpu-supertonic" + metal: "metal-supertonic" - !!merge <<: *neutts name: "neutts-development" capabilities: @@ -2650,6 +2712,121 @@ uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-parakeet-cpp" mirrors: - localai/localai-backends:master-gpu-nvidia-cuda-13-parakeet-cpp +## ced +- !!merge <<: *ced + name: "ced-development" + capabilities: + default: "cpu-ced-development" + nvidia: "cuda12-ced-development" + intel: "intel-sycl-f16-ced-development" + metal: "metal-ced-development" + amd: "rocm-ced-development" + vulkan: "vulkan-ced-development" + nvidia-l4t: "nvidia-l4t-arm64-ced-development" + nvidia-cuda-13: "cuda13-ced-development" + nvidia-cuda-12: "cuda12-ced-development" + nvidia-l4t-cuda-12: "nvidia-l4t-arm64-ced-development" + nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-ced-development" +- !!merge <<: *ced + name: "nvidia-l4t-arm64-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-ced" + mirrors: + - localai/localai-backends:latest-nvidia-l4t-arm64-ced +- !!merge <<: *ced + name: "nvidia-l4t-arm64-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-ced" + mirrors: + - localai/localai-backends:master-nvidia-l4t-arm64-ced +- !!merge <<: *ced + name: "cuda13-nvidia-l4t-arm64-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-ced" + mirrors: + - localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-ced +- !!merge <<: *ced + name: "cuda13-nvidia-l4t-arm64-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-ced" + mirrors: + - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-ced +- !!merge <<: *ced + name: "cpu-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-ced" + mirrors: + - localai/localai-backends:latest-cpu-ced +- !!merge <<: *ced + name: "cpu-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-cpu-ced" + mirrors: + - localai/localai-backends:master-cpu-ced +- !!merge <<: *ced + name: "metal-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-ced" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-ced +- !!merge <<: *ced + name: "metal-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-ced" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-ced +- !!merge <<: *ced + name: "cuda12-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-ced" + mirrors: + - localai/localai-backends:latest-gpu-nvidia-cuda-12-ced +- !!merge <<: *ced + name: "cuda12-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-ced" + mirrors: + - localai/localai-backends:master-gpu-nvidia-cuda-12-ced +- !!merge <<: *ced + name: "rocm-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-ced" + mirrors: + - localai/localai-backends:latest-gpu-rocm-hipblas-ced +- !!merge <<: *ced + name: "rocm-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-ced" + mirrors: + - localai/localai-backends:master-gpu-rocm-hipblas-ced +- !!merge <<: *ced + name: "intel-sycl-f32-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-ced" + mirrors: + - localai/localai-backends:latest-gpu-intel-sycl-f32-ced +- !!merge <<: *ced + name: "intel-sycl-f32-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-ced" + mirrors: + - localai/localai-backends:master-gpu-intel-sycl-f32-ced +- !!merge <<: *ced + name: "intel-sycl-f16-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-ced" + mirrors: + - localai/localai-backends:latest-gpu-intel-sycl-f16-ced +- !!merge <<: *ced + name: "intel-sycl-f16-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-ced" + mirrors: + - localai/localai-backends:master-gpu-intel-sycl-f16-ced +- !!merge <<: *ced + name: "vulkan-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-ced" + mirrors: + - localai/localai-backends:latest-gpu-vulkan-ced +- !!merge <<: *ced + name: "vulkan-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-ced" + mirrors: + - localai/localai-backends:master-gpu-vulkan-ced +- !!merge <<: *ced + name: "cuda13-ced" + uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-ced" + mirrors: + - localai/localai-backends:latest-gpu-nvidia-cuda-13-ced +- !!merge <<: *ced + name: "cuda13-ced-development" + uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ced" + mirrors: + - localai/localai-backends:master-gpu-nvidia-cuda-13-ced ## stablediffusion-ggml - !!merge <<: *stablediffusionggml name: "cpu-stablediffusion-ggml" @@ -2781,6 +2958,17 @@ nvidia-cuda-13: "cuda13-vllm-development" nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development" cpu: "cpu-vllm-development" + metal: "metal-vllm-development" +- !!merge <<: *vllm + name: "metal-vllm" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-vllm" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-vllm +- !!merge <<: *vllm + name: "metal-vllm-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-vllm" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-vllm - !!merge <<: *vllm name: "cuda12-vllm" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm" @@ -3060,6 +3248,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sam3-cpp-development" intel: "intel-sycl-f32-sam3-cpp-development" vulkan: "vulkan-sam3-cpp-development" + metal: "metal-sam3-cpp-development" - !!merge <<: *sam3cpp name: "cpu-sam3-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-sam3-cpp" @@ -3070,6 +3259,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-sam3-cpp" mirrors: - localai/localai-backends:master-cpu-sam3-cpp +- !!merge <<: *sam3cpp + name: "metal-sam3-cpp" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-sam3-cpp" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-sam3-cpp +- !!merge <<: *sam3cpp + name: "metal-sam3-cpp-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-sam3-cpp" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-sam3-cpp - !!merge <<: *sam3cpp name: "cuda12-sam3-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sam3-cpp" @@ -3143,6 +3342,7 @@ nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-rfdetr-cpp-development" intel: "intel-sycl-f32-rfdetr-cpp-development" vulkan: "vulkan-rfdetr-cpp-development" + metal: "metal-rfdetr-cpp-development" - !!merge <<: *rfdetrcpp name: "cpu-rfdetr-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-rfdetr-cpp" @@ -3153,6 +3353,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-rfdetr-cpp" mirrors: - localai/localai-backends:master-cpu-rfdetr-cpp +- !!merge <<: *rfdetrcpp + name: "metal-rfdetr-cpp" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-rfdetr-cpp" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-rfdetr-cpp +- !!merge <<: *rfdetrcpp + name: "metal-rfdetr-cpp-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-rfdetr-cpp" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-rfdetr-cpp - !!merge <<: *rfdetrcpp name: "cuda12-rfdetr-cpp" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rfdetr-cpp" @@ -3941,6 +4151,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-localvqe" mirrors: - localai/localai-backends:master-gpu-vulkan-localvqe +- !!merge <<: *localvqecpp + name: "metal-localvqe" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-localvqe" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-localvqe +- !!merge <<: *localvqecpp + name: "metal-localvqe-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-localvqe" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-localvqe ## kokoro - !!merge <<: *kokoro name: "kokoro-development" @@ -4466,6 +4686,7 @@ nvidia-cuda-13: "cuda13-liquid-audio-development" nvidia-cuda-12: "cuda12-liquid-audio-development" nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-liquid-audio-development" + metal: "metal-liquid-audio-development" - !!merge <<: *liquid-audio name: "cpu-liquid-audio" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-liquid-audio" @@ -4476,6 +4697,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-liquid-audio" mirrors: - localai/localai-backends:master-cpu-liquid-audio +- !!merge <<: *liquid-audio + name: "metal-liquid-audio" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-liquid-audio" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-liquid-audio +- !!merge <<: *liquid-audio + name: "metal-liquid-audio-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-liquid-audio" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-liquid-audio - !!merge <<: *liquid-audio name: "cuda12-liquid-audio" uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-liquid-audio" @@ -5136,6 +5367,7 @@ nvidia: "cuda12-trl" nvidia-cuda-12: "cuda12-trl" nvidia-cuda-13: "cuda13-trl" + metal: "metal-trl" ## TRL backend images - !!merge <<: *trl name: "cpu-trl" @@ -5167,6 +5399,16 @@ uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-trl" mirrors: - localai/localai-backends:master-gpu-nvidia-cuda-13-trl +- !!merge <<: *trl + name: "metal-trl" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-trl" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-trl +- !!merge <<: *trl + name: "metal-trl-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-trl" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-trl ## llama.cpp quantization backend - &llama-cpp-quantization name: "llama-cpp-quantization" @@ -5338,6 +5580,7 @@ name: "supertonic-development" capabilities: default: "cpu-supertonic-development" + metal: "metal-supertonic-development" - !!merge <<: *supertonic name: "cpu-supertonic" uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-supertonic" @@ -5348,3 +5591,13 @@ uri: "quay.io/go-skynet/local-ai-backends:master-cpu-supertonic" mirrors: - localai/localai-backends:master-cpu-supertonic +- !!merge <<: *supertonic + name: "metal-supertonic" + uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-supertonic" + mirrors: + - localai/localai-backends:latest-metal-darwin-arm64-supertonic +- !!merge <<: *supertonic + name: "metal-supertonic-development" + uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-supertonic" + mirrors: + - localai/localai-backends:master-metal-darwin-arm64-supertonic diff --git a/backend/python/diffusers/requirements-cpu.txt b/backend/python/diffusers/requirements-cpu.txt index 8db419b29..46959222c 100644 --- a/backend/python/diffusers/requirements-cpu.txt +++ b/backend/python/diffusers/requirements-cpu.txt @@ -1,7 +1,7 @@ --extra-index-url https://download.pytorch.org/whl/cpu -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 torchvision==0.22.1 accelerate git+https://github.com/xhinker/sd_embed @@ -10,9 +10,15 @@ sentencepiece torch==2.7.1 optimum-quanto ftfy -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. \ No newline at end of file +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. \ No newline at end of file diff --git a/backend/python/diffusers/requirements-cublas12.txt b/backend/python/diffusers/requirements-cublas12.txt index e3351ae75..5e6852cc7 100644 --- a/backend/python/diffusers/requirements-cublas12.txt +++ b/backend/python/diffusers/requirements-cublas12.txt @@ -1,7 +1,7 @@ --extra-index-url https://download.pytorch.org/whl/cu121 -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 torchvision accelerate git+https://github.com/xhinker/sd_embed @@ -10,9 +10,15 @@ sentencepiece torch ftfy optimum-quanto -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. diff --git a/backend/python/diffusers/requirements-cublas13.txt b/backend/python/diffusers/requirements-cublas13.txt index 546998ba4..ce77b6e6e 100644 --- a/backend/python/diffusers/requirements-cublas13.txt +++ b/backend/python/diffusers/requirements-cublas13.txt @@ -1,7 +1,7 @@ --extra-index-url https://download.pytorch.org/whl/cu130 -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 torchvision accelerate git+https://github.com/xhinker/sd_embed @@ -10,9 +10,15 @@ sentencepiece torch ftfy optimum-quanto -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. diff --git a/backend/python/diffusers/requirements-hipblas.txt b/backend/python/diffusers/requirements-hipblas.txt index 3480d1fd6..f3666d5f5 100644 --- a/backend/python/diffusers/requirements-hipblas.txt +++ b/backend/python/diffusers/requirements-hipblas.txt @@ -1,17 +1,23 @@ --extra-index-url https://download.pytorch.org/whl/rocm7.0 torch==2.10.0+rocm7.0 torchvision==0.25.0+rocm7.0 -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 accelerate peft sentencepiece optimum-quanto ftfy -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. \ No newline at end of file +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. \ No newline at end of file diff --git a/backend/python/diffusers/requirements-intel.txt b/backend/python/diffusers/requirements-intel.txt index c78f5ef23..73ab5b3b8 100644 --- a/backend/python/diffusers/requirements-intel.txt +++ b/backend/python/diffusers/requirements-intel.txt @@ -3,18 +3,24 @@ torch torchvision optimum[openvino] setuptools -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 accelerate git+https://github.com/xhinker/sd_embed peft sentencepiece optimum-quanto ftfy -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. \ No newline at end of file +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. \ No newline at end of file diff --git a/backend/python/diffusers/requirements-l4t12.txt b/backend/python/diffusers/requirements-l4t12.txt index 15857c4b0..9a9cdb0df 100644 --- a/backend/python/diffusers/requirements-l4t12.txt +++ b/backend/python/diffusers/requirements-l4t12.txt @@ -1,7 +1,7 @@ --extra-index-url https://pypi.jetson-ai-lab.io/jp6/cu129/ torch -git+https://github.com/huggingface/diffusers -transformers +diffusers==0.38.0 +transformers==4.57.6 accelerate peft optimum-quanto @@ -9,9 +9,15 @@ numpy<2 sentencepiece torchvision ftfy -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. diff --git a/backend/python/diffusers/requirements-l4t13.txt b/backend/python/diffusers/requirements-l4t13.txt index 226033a61..964c9c9f2 100644 --- a/backend/python/diffusers/requirements-l4t13.txt +++ b/backend/python/diffusers/requirements-l4t13.txt @@ -1,7 +1,7 @@ --extra-index-url https://download.pytorch.org/whl/cu130 torch -git+https://github.com/huggingface/diffusers -transformers +diffusers==0.38.0 +transformers==4.57.6 accelerate peft optimum-quanto @@ -10,9 +10,15 @@ sentencepiece torchvision ftfy chardet -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. diff --git a/backend/python/diffusers/requirements-mps.txt b/backend/python/diffusers/requirements-mps.txt index 58eb65f02..eeea59ddd 100644 --- a/backend/python/diffusers/requirements-mps.txt +++ b/backend/python/diffusers/requirements-mps.txt @@ -1,16 +1,22 @@ torch==2.7.1 torchvision==0.22.1 -git+https://github.com/huggingface/diffusers +diffusers==0.38.0 opencv-python -transformers +transformers==4.57.6 accelerate peft sentencepiece optimum-quanto ftfy -# TODO: re-add compel once it supports transformers >= 5. -# Tracking: https://github.com/damian0815/compel/pull/129 -# https://github.com/damian0815/compel/issues/128 -# compel currently pins transformers~=4.25, which forced pip into multi-hour -# resolver backtracking storms in CI. backend.py imports it lazily and gates -# the COMPEL=1 env var on the import succeeding, so dropping it here is safe. \ No newline at end of file +# diffusers and transformers are pinned together on purpose. transformers v5 +# restructured CLIPTextModel and dropped the `.text_model` attribute, which +# breaks single-file Stable Diffusion loading on every released diffusers +# (<=0.38.0); only unreleased diffusers main supports transformers v5. Tracking +# main via git froze whichever broken pair existed at image-build time. Pin the +# last known-good released pair so builds are reproducible and can't drift into +# the broken window. See https://github.com/mudler/LocalAI/issues/9979 +# +# compel is intentionally omitted: it pins transformers~=4.25, which conflicts +# with this pin and previously forced pip into multi-hour resolver backtracking +# storms in CI. backend.py imports it lazily and gates the COMPEL=1 env var on +# the import succeeding, so dropping it here is safe. \ No newline at end of file diff --git a/backend/python/liquid-audio/install.sh b/backend/python/liquid-audio/install.sh index c7ed8eaa8..fe0f9caad 100755 --- a/backend/python/liquid-audio/install.sh +++ b/backend/python/liquid-audio/install.sh @@ -14,5 +14,11 @@ else fi # liquid-audio's torch wheels are large; allow upgrades to satisfy transitive pins -EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match" +EXTRA_PIP_INSTALL_FLAGS+=" --upgrade" +# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip +# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add +# it on the uv path; Linux/CUDA resolution is unchanged. +if [ "x${USE_PIP:-}" != "xtrue" ]; then + EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match" +fi installRequirements diff --git a/backend/python/liquid-audio/requirements-mps.txt b/backend/python/liquid-audio/requirements-mps.txt index f57687f29..3c9c36cca 100644 --- a/backend/python/liquid-audio/requirements-mps.txt +++ b/backend/python/liquid-audio/requirements-mps.txt @@ -1,3 +1,4 @@ +# MPS (Apple Silicon / Metal) build profile - installed by the darwin CI job. torch>=2.8.0 torchaudio>=2.8.0 torchcodec>=0.9.1 diff --git a/backend/python/nemo/backend.py b/backend/python/nemo/backend.py index ccbff7cd2..a5c30694e 100644 --- a/backend/python/nemo/backend.py +++ b/backend/python/nemo/backend.py @@ -84,6 +84,135 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): return backend_pb2.Result(message="Model loaded successfully", success=True) + def _get_stride_seconds(self): + """Compute the seconds-per-frame stride for the loaded model. + + stride = preprocessor_window_stride * encoder_subsampling_factor + """ + try: + preprocessor = self.model.preprocessor + window_stride = preprocessor._cfg.get('window_stride', 0.01) + subsampling_factor = getattr(self.model.encoder, 'subsampling_factor', 8) + return window_stride * subsampling_factor + except (AttributeError, KeyError, TypeError) as err: + print( + f"Warning: could not compute stride from model config ({err}), " + f"falling back to 0.08s/frame", + file=sys.stderr, + ) + return 0.08 + + def _build_segments_with_words(self, hypothesis, stride, timestamp_granularities=None): + """Build TranscriptSegment list from a NeMo Hypothesis with timestamps. + + Supports two granularity modes: + - "word": one TranscriptSegment per word, each with a single TranscriptWord entry + - "segment" (default): merge consecutive words into sentence-level segments, + splitting at word-level time gaps that exceed a dynamic threshold. + """ + if not hypothesis or not isinstance(hypothesis.timestamp, dict): + return [] + + word_offsets = hypothesis.timestamp.get('word', []) + if not word_offsets: + return [] + + granularities = list(timestamp_granularities) if timestamp_granularities else [] + granularity = "word" if "word" in granularities else "segment" + + # Build a flat list of (text, start_ns, end_ns) from NeMo word offsets + transcript_words = [] + for wo in word_offsets: + word_text = wo.get('word', '') + if not word_text: + continue + start_offset = wo.get('start_offset', 0) + end_offset = wo.get('end_offset', start_offset) + start_ns = int(start_offset * stride * 1_000_000_000) + end_ns = int(end_offset * stride * 1_000_000_000) + transcript_words.append({ + 'text': word_text, + 'start': start_ns, + 'end': end_ns, + }) + + if not transcript_words: + return [] + + if granularity == "word": + # One segment per word + result = [] + for idx, tw in enumerate(transcript_words): + word = backend_pb2.TranscriptWord( + start=tw['start'], end=tw['end'], text=tw['text'] + ) + result.append(backend_pb2.TranscriptSegment( + id=idx, + start=tw['start'], + end=tw['end'], + text=tw['text'], + words=[word], + )) + return result + + # segment mode — merge at word-level time-gap boundaries + # Compute gap threshold: median inter-word gap * 3, clamped to [0.3, 2.0]s + gaps = [] + for i in range(1, len(transcript_words)): + gap = (transcript_words[i]['start'] - transcript_words[i - 1]['end']) / 1_000_000_000 + if gap > 0: + gaps.append(gap) + if gaps: + gaps.sort() + median_gap = gaps[len(gaps) // 2] + threshold_ns = int(max(0.3, min(median_gap * 3, 2.0)) * 1_000_000_000) + else: + threshold_ns = int(0.5 * 1_000_000_000) + + result = [] + buf_words = [] # list of TranscriptWord protobuf + buf_start = None + buf_end = 0 + buf_text = [] + prev_end = None + + for tw in transcript_words: + # Detect word-level time gap + if prev_end is not None and (tw['start'] - prev_end) >= threshold_ns and buf_text: + seg_text = ' '.join(buf_text) + result.append(backend_pb2.TranscriptSegment( + id=len(result), + start=buf_start, + end=buf_end, + text=seg_text, + words=list(buf_words), + )) + buf_words = [] + buf_text = [] + buf_start = None + + if buf_start is None: + buf_start = tw['start'] + buf_end = tw['end'] + buf_text.append(tw['text']) + buf_words.append(backend_pb2.TranscriptWord( + start=tw['start'], end=tw['end'], text=tw['text'] + )) + prev_end = tw['end'] + + # flush remaining + if buf_text and buf_start is not None: + seg_text = ' '.join(buf_text) + result.append(backend_pb2.TranscriptSegment( + id=len(result), + start=buf_start, + end=buf_end, + text=seg_text, + words=list(buf_words), + )) + + return result + def AudioTranscription(self, request, context): result_segments = [] text = "" @@ -93,26 +222,67 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): print(f"Error: Audio file not found: {audio_path}", file=sys.stderr) return backend_pb2.TranscriptResult(segments=[], text="") - # NEMO's transcribe method accepts a list of audio paths and returns a list of transcripts - results = self.model.transcribe([audio_path]) + # Determine requested timestamp granularity + timestamp_granularities = list(request.timestamp_granularities) if request.timestamp_granularities else [] + want_timestamps = bool(timestamp_granularities) - if not results or len(results) == 0: - return backend_pb2.TranscriptResult(segments=[], text="") + if want_timestamps: + # Request timestamps from NeMo. + # timestamps=True forces NeMo to return Hypothesis objects with + # the timestamp dict populated, so we omit return_hypotheses to + # let NeMo choose the correct return type. + results = self.model.transcribe([audio_path], timestamps=True) - # Get the transcript text from the first result. - # CTC models return List[str], TDT/RNNT models return List[Hypothesis] - # where the actual text lives in Hypothesis.text. - result = results[0] - if isinstance(result, str): - text = result + if results and len(results) > 0: + hypotheses = results[0] if isinstance(results[0], list) else results + if hypotheses and len(hypotheses) > 0: + hypothesis = hypotheses[0] + + # Hypothesis object should have .timestamp populated + if not hasattr(hypothesis, 'timestamp') or not isinstance(hypothesis.timestamp, dict): + print( + "Warning: timestamps were requested but NeMo did not return " + "Hypothesis objects; falling back to untimestamped output", + file=sys.stderr, + ) + + # Extract text + if hasattr(hypothesis, 'text'): + text = hypothesis.text or "" + elif isinstance(hypothesis, str): + text = hypothesis + + # Build segments with word-level timestamps + stride = self._get_stride_seconds() + result_segments = self._build_segments_with_words( + hypothesis, stride, timestamp_granularities + ) + + # If no word offsets but we have text, fall back to single segment + if not result_segments and text: + result_segments.append(backend_pb2.TranscriptSegment( + id=0, start=0, end=0, text=text + )) else: - text = getattr(result, 'text', None) or "" + # Simple transcription without timestamps + # NEMO's transcribe method accepts a list of audio paths and returns a list of transcripts + results = self.model.transcribe([audio_path]) - if text: - # Create a single segment with the full transcription - result_segments.append(backend_pb2.TranscriptSegment( - id=0, start=0, end=0, text=text - )) + if results and len(results) > 0: + # Get the transcript text from the first result. + # CTC models return List[str], TDT/RNNT models return List[Hypothesis] + # where the actual text lives in Hypothesis.text. + result = results[0] + if isinstance(result, str): + text = result + else: + text = getattr(result, 'text', None) or "" + + if text: + # Create a single segment with the full transcription + result_segments.append(backend_pb2.TranscriptSegment( + id=0, start=0, end=0, text=text + )) except Exception as err: print(f"Error in AudioTranscription: {err}", file=sys.stderr) diff --git a/backend/python/trl/backend.py b/backend/python/trl/backend.py index 3ea4de975..2e7cd34ab 100644 --- a/backend/python/trl/backend.py +++ b/backend/python/trl/backend.py @@ -309,6 +309,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): dataset_split = request.dataset_split or "train" if os.path.exists(request.dataset_source): + _allowed_dir = os.path.realpath(os.path.abspath(os.environ.get("LOCALAI_DATASET_DIR", os.getcwd()))) + _real_path = os.path.realpath(os.path.abspath(request.dataset_source)) + if not (_real_path == _allowed_dir or _real_path.startswith(_allowed_dir + os.sep)): + raise ValueError("Dataset source path is outside the allowed directory") if request.dataset_source.endswith('.json') or request.dataset_source.endswith('.jsonl'): dataset = load_dataset("json", data_files=request.dataset_source, split=dataset_split) elif request.dataset_source.endswith('.csv'): @@ -687,6 +691,11 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): def ExportModel(self, request, context): export_format = request.export_format or "lora" output_path = request.output_path + _allowed_output_dir = os.path.realpath(os.path.abspath(os.environ.get("LOCALAI_OUTPUT_DIR", os.getcwd()))) + _real_output_path = os.path.realpath(os.path.abspath(output_path)) + if not (_real_output_path == _allowed_output_dir or _real_output_path.startswith(_allowed_output_dir + os.sep)): + raise ValueError("Output path is outside the allowed directory") + output_path = _real_output_path checkpoint_path = request.checkpoint_path # Extract HF token for gated model access @@ -807,7 +816,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): env = os.environ.copy() env["NO_LOCAL_GGUF"] = "1" cmd = [sys.executable, convert_script, merge_dir, "--outtype", outtype, "--outfile", gguf_path] - conv_result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600, env=env) + conv_result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600, env=env, shell=False) # nosemgrep: python.django.security.injection.command.subprocess-injection.subprocess-injection if conv_result.returncode != 0: diag = f"stdout: {conv_result.stdout[-300:]}\nstderr: {conv_result.stderr[-500:]}" return backend_pb2.Result(success=False, diff --git a/backend/python/trl/install.sh b/backend/python/trl/install.sh index 6963e60ed..ce0552f87 100644 --- a/backend/python/trl/install.sh +++ b/backend/python/trl/install.sh @@ -8,7 +8,13 @@ else source $backend_dir/../common/libbackend.sh fi -EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match" +EXTRA_PIP_INSTALL_FLAGS+=" --upgrade" +# --index-strategy is a uv-only flag. The darwin/MPS build installs with pip +# (USE_PIP=true in scripts/build/python-darwin.sh), which rejects it. Only add +# it when uv is the installer, keeping the Linux/CUDA resolution unchanged. +if [ "x${USE_PIP:-}" != "xtrue" ]; then + EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-first-match" +fi installRequirements # Fetch convert_hf_to_gguf.py and gguf package from the same llama.cpp version diff --git a/backend/python/trl/requirements-mps.txt b/backend/python/trl/requirements-mps.txt new file mode 100644 index 000000000..fbdfb6536 --- /dev/null +++ b/backend/python/trl/requirements-mps.txt @@ -0,0 +1,12 @@ +torch==2.10.0 +trl +peft +datasets>=3.0.0 +transformers>=4.56.2 +accelerate>=1.4.0 +huggingface-hub>=1.3.0 +sentencepiece +# Note: bitsandbytes is intentionally omitted on MPS. It is only used by the +# CUDA (cublas) variants for 8-bit/4-bit quantization and has poor support on +# Apple Silicon. torch here uses the plain PyPI wheels, which ship MPS support +# on macOS arm64. diff --git a/backend/python/vllm/backend.py b/backend/python/vllm/backend.py index 5d5662857..1e93f26e2 100644 --- a/backend/python/vllm/backend.py +++ b/backend/python/vllm/backend.py @@ -48,8 +48,10 @@ try: except ImportError: HAS_REASONING_PARSERS = False +# vLLM >= 0.23 renamed GuidedDecodingParams -> StructuredOutputsParams and the +# SamplingParams field guided_decoding -> structured_outputs. try: - from vllm.sampling_params import GuidedDecodingParams + from vllm.sampling_params import StructuredOutputsParams HAS_GUIDED_DECODING = True except ImportError: HAS_GUIDED_DECODING = False @@ -455,9 +457,14 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): except Exception: pass - if last_output is None or not getattr(last_output, "prompt_logprobs", None): - context.set_code(grpc.StatusCode.INTERNAL) - context.set_details("vLLM did not return prompt_logprobs") + _pl = getattr(last_output, "prompt_logprobs", None) if last_output is not None else None + # Some engines accept the prompt_logprobs request but return a + # list of all-None entries instead of computing them (observed + # with vllm-metal's MLX backend on macOS). Treat that as + # unsupported rather than silently scoring every candidate as 0. + if not _pl or all(e is None for e in _pl): + context.set_code(grpc.StatusCode.UNIMPLEMENTED) + context.set_details("This backend did not return prompt_logprobs; scoring is unsupported on this engine (e.g. vllm-metal / MLX on macOS).") return backend_pb2.ScoreResponse() prompt_logprobs = last_output.prompt_logprobs @@ -536,13 +543,13 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): if value not in (None, 0, [], False): setattr(sampling_params, param_field, value) - # Guided decoding: use Grammar field to pass JSON schema or BNF + # Structured-output decoding: use Grammar field to pass JSON schema or BNF if HAS_GUIDED_DECODING and request.Grammar: try: json.loads(request.Grammar) # valid JSON = JSON schema - sampling_params.guided_decoding = GuidedDecodingParams(json=request.Grammar) + sampling_params.structured_outputs = StructuredOutputsParams(json=request.Grammar) except json.JSONDecodeError: - sampling_params.guided_decoding = GuidedDecodingParams(grammar=request.Grammar) + sampling_params.structured_outputs = StructuredOutputsParams(grammar=request.Grammar) # Extract image paths and process images prompt = request.Prompt @@ -596,23 +603,124 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): # Stream the results generated_text = "" + generated_token_ids: list[int] = [] last_output = None + + # Tool-parsing strategy decision (made once, before the loop): + # + # When a tool parser is active, the model's raw tool-call markup + # (e.g. ...) must not be streamed verbatim as delta.content + # — clients would see the unparsed syntax. Two paths: + # + # (A) native streaming via parser.extract_tool_calls_streaming. All + # concrete tool parsers shipped with vLLM 0.23+ implement this + # (Granite4, Qwen3Coder, DeepSeekV31, Jamba, Ernie45, Hermes, + # llama3_json, mistral, …). The parser decides per-delta whether + # to emit content or suppress tool-call markup, and emits a + # structured DeltaMessage(tool_calls=[...]) when a call is ready. + # (B) buffer fallback — used only when the parser surprisingly lacks + # the streaming method or it raises mid-stream. The post-loop + # extract_tool_calls assembles the final chat_delta. Same correctness + # guarantee as a non-streaming response, at the cost of a delayed + # final chunk. + has_tool_parser = bool(self.tool_parser_cls and request.Tools) + tp_instance = None + tp_request = None + native_streaming = False + native_streaming_error = False + if has_tool_parser: + try: + tools_for_parser = json.loads(request.Tools) + except json.JSONDecodeError: + tools_for_parser = [] + try: + tp_instance = self.tool_parser_cls(self.tokenizer, tools=tools_for_parser) + except TypeError: + tp_instance = self.tool_parser_cls(self.tokenizer) + # Build a minimal ChatCompletionRequest so the streaming method + # sees the tools list. We do not need any other request fields — + # parsers only read .tools (and sometimes .tool_choice, which we + # leave at default). + try: + from vllm.entrypoints.openai.chat_completion.protocol import ( + ChatCompletionRequest as _CCR, + ) + tp_request = _CCR( + model="local", + messages=[{"role": "user", "content": ""}], + tools=tools_for_parser or None, + ) + except Exception as e: + print(f"Could not build ChatCompletionRequest for streaming parser: {e}", + file=sys.stderr) + tp_request = None + native_streaming = ( + tp_request is not None + and hasattr(tp_instance, "extract_tool_calls_streaming") + ) + try: async for request_output in outputs: iteration_text = request_output.outputs[0].text last_output = request_output if streaming: - # Remove text already sent as vllm concatenates the text from previous yields delta_iteration_text = iteration_text.removeprefix(generated_text) - # Send the partial result - yield backend_pb2.Reply( - message=bytes(delta_iteration_text, encoding='utf-8'), - chat_deltas=[backend_pb2.ChatDelta(content=delta_iteration_text)], - ) + new_token_ids = list(request_output.outputs[0].token_ids) + delta_token_ids = new_token_ids[len(generated_token_ids):] - # Keep track of text generated + if not has_tool_parser: + # Plain streaming — unchanged from pre-tool-parser path. + yield backend_pb2.Reply( + message=bytes(delta_iteration_text, encoding='utf-8'), + chat_deltas=[backend_pb2.ChatDelta(content=delta_iteration_text)], + ) + elif native_streaming and not native_streaming_error: + # (A) Native vLLM extract_tool_calls_streaming. + try: + msg = tp_instance.extract_tool_calls_streaming( + previous_text=generated_text, + current_text=iteration_text, + delta_text=delta_iteration_text, + previous_token_ids=generated_token_ids, + current_token_ids=new_token_ids, + delta_token_ids=delta_token_ids, + request=tp_request, + ) + except Exception as e: + print(f"Streaming tool parser error (falling back to " + f"buffer for the rest of the stream): {e}", + file=sys.stderr) + native_streaming_error = True + msg = None + if msg is not None: + tc_protos = [] + for tc in (msg.tool_calls or []): + fn = tc.function or None + tc_protos.append(backend_pb2.ToolCallDelta( + index=tc.index, + id=tc.id or "", + name=(fn.name if fn and fn.name else "") or "", + arguments=(fn.arguments if fn and fn.arguments else "") or "", + )) + cd_kwargs = {} + if msg.content: + cd_kwargs["content"] = msg.content + if msg.reasoning: + cd_kwargs["reasoning_content"] = msg.reasoning + if tc_protos: + cd_kwargs["tool_calls"] = tc_protos + if cd_kwargs: + yield backend_pb2.Reply( + message=bytes(msg.content or "", encoding='utf-8'), + chat_deltas=[backend_pb2.ChatDelta(**cd_kwargs)], + ) + # (B) buffer fallback — emit nothing during the stream. + # The post-loop extract_tool_calls block builds the final chunk. + + # Keep track of text + token_ids generated generated_text = iteration_text + generated_token_ids = list(request_output.outputs[0].token_ids) finally: await outputs.aclose() @@ -637,16 +745,19 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): except Exception as e: print(f"Reasoning parser error: {e}", file=sys.stderr) - if self.tool_parser_cls and request.Tools: + # When (A) native streaming ran cleanly, per-delta yields above already + # delivered everything — do NOT extract again on the full text or we'd + # duplicate content/tool_calls into the final chunk. + if has_tool_parser and not (native_streaming and not native_streaming_error): try: - tools = json.loads(request.Tools) - # Some concrete parsers only accept the tokenizer; only the - # abstract base declares the tools kwarg. Try with tools first, - # fall back to tokenizer-only. - try: - tp = self.tool_parser_cls(self.tokenizer, tools=tools) - except TypeError: - tp = self.tool_parser_cls(self.tokenizer) + tp = tp_instance + if tp is None: + # Defensive: tp_instance build failed earlier; reconstruct. + tools = json.loads(request.Tools) + try: + tp = self.tool_parser_cls(self.tokenizer, tools=tools) + except TypeError: + tp = self.tool_parser_cls(self.tokenizer) info = tp.extract_tool_calls(content, request=None) if info.tools_called: content = info.content or "" @@ -659,6 +770,10 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): )) except Exception as e: print(f"Tool parser error: {e}", file=sys.stderr) + elif native_streaming and not native_streaming_error: + # Per-delta path already emitted content + tool_calls; the final + # chat_delta should carry only metadata (token counts, logprobs). + content = "" # Extract token counts prompt_tokens = 0 @@ -698,7 +813,26 @@ class BackendServicer(backend_pb2_grpc.BackendServicer): ) if streaming: - # Final chunk with structured data + # Final chunk with structured data. + # + # If we used the buffer fallback (has_tool_parser=True AND native + # streaming did NOT run cleanly) and the parser found no tool call, + # flush the buffered content as ONE content delta — and clear the + # final chat_delta's content so the metadata chunk does not repeat + # what we just sent. This is the plain-text-with-tool-parser path. + buffered_fallback = ( + has_tool_parser + and not (native_streaming and not native_streaming_error) + ) + if buffered_fallback and not tool_calls_proto and content: + yield backend_pb2.Reply( + message=bytes(content, encoding='utf-8'), + chat_deltas=[backend_pb2.ChatDelta(content=content)], + ) + chat_delta = backend_pb2.ChatDelta( + reasoning_content=reasoning_content, + tool_calls=tool_calls_proto, + ) yield backend_pb2.Reply( message=b"", prompt_tokens=prompt_tokens, diff --git a/backend/python/vllm/install.sh b/backend/python/vllm/install.sh index 320ef6772..85c1e97b0 100755 --- a/backend/python/vllm/install.sh +++ b/backend/python/vllm/install.sh @@ -43,6 +43,24 @@ if [ "x${BUILD_PROFILE}" == "xcublas13" ]; then EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match" fi +# Apple Silicon (Metal/MLX) via vllm-metal. +# vllm-metal (github.com/vllm-project/vllm-metal) brings vLLM to macOS on Apple +# Silicon: it registers through vLLM's platform-plugin entry point +# (metal -> vllm_metal:register), MetalPlatform activates, and the vLLM v1 +# AsyncLLM engine runs on the GPU through MLX. LocalAI's backend.py is UNCHANGED +# on darwin — AsyncEngineArgs(...) -> AsyncLLMEngine.from_engine_args transparently +# resolves to the MLX engine (proven on a real M4 / macOS 26.5 against Qwen3-0.6B). +# +# vllm-metal REQUIRES Python 3.12, so force the portable CPython before the venv +# is created (ensureVenv reads PYTHON_VERSION/PYTHON_PATCH/PY_STANDALONE_TAG). +# The patch + standalone tag mirror the l4t13 cp312 pin — a known-good +# python-build-standalone release that also ships an aarch64-apple-darwin asset. +if [ "$(uname -s)" = "Darwin" ]; then + PYTHON_VERSION="3.12" + PYTHON_PATCH="12" + PY_STANDALONE_TAG="20251120" +fi + # JetPack 7 / L4T arm64 vllm + torch wheels come straight from PyPI now # (torch 2.11+ ships aarch64 + cu130 manylinux wheels and vllm 0.20+ ships # an aarch64 wheel pinned to that torch). They're cp312-only, so bump the @@ -57,11 +75,87 @@ if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then PY_STANDALONE_TAG="20251120" fi +# ===================== Apple Silicon (Metal/MLX) ===================== +# Reproduce vllm-metal's upstream installer +# (curl -fsSL https://raw.githubusercontent.com/vllm-project/vllm-metal/main/install.sh) +# but INTO LocalAI's managed venv (ensureVenv) instead of a throwaway +# ~/.venv-vllm-metal, so the backend integrates with LocalAI's venv lifecycle +# (portable CPython, _makeVenvPortable relocation, runtime activation). The +# normal CUDA/CPU installRequirements is skipped on darwin — there is no +# macOS/arm64 vLLM wheel on PyPI; vLLM is built from source and the MLX engine +# is layered on by the vllm-metal wheel. +if [ "$(uname -s)" = "Darwin" ]; then + # Create/activate the portable 3.12 venv. On darwin USE_PIP=true and + # PORTABLE_PYTHON=true (set by scripts/build/python-darwin.sh), so this is a + # `python -m venv` based, relocatable venv. + ensureVenv + + # vllm-metal's installer drives everything through `uv`: building vLLM from + # the CPU requirements needs `--index-strategy unsafe-best-match` (mixes the + # pytorch CPU channel with PyPI), a flag plain pip does not have. The darwin + # venv is pip-based, so bootstrap uv into it. uv honours $VIRTUAL_ENV (set by + # libbackend's _activateVenv) and installs into THIS venv — same pattern the + # intel branch below relies on. + pip install uv + + # The ONLY darwin version pin -- AUTO-BUMPED by .github/bump_vllm_metal.sh, + # which tracks vllm-project/vllm-metal releases (NOT vllm/vllm latest). Keep + # it as a plain double-quoted assignment on its own line so the bumper's sed + # can rewrite it. Darwin therefore follows vllm-metal and can lag the Linux + # vllm pin (requirements-cublas13-after.txt, bumped independently against + # vllm/vllm) until vllm-metal supports a newer vLLM. + VLLM_METAL_VERSION="v0.3.0.dev20260622062346" + + # The coupled vLLM source version is whatever this vllm-metal release builds + # against -- it declares it in its own installer as `vllm_v=`. Derive it from + # the PINNED tag rather than hardcoding a second value that could drift. The + # tag is immutable, so this stays reproducible across rebuilds. + VLLM_VERSION=$(curl -fsSL "https://raw.githubusercontent.com/vllm-project/vllm-metal/${VLLM_METAL_VERSION}/install.sh" \ + | grep -oE 'vllm_v="[0-9]+\.[0-9]+\.[0-9]+"' | head -n1 | cut -d'"' -f2) + if [ -z "${VLLM_VERSION}" ]; then + echo "ERROR: could not derive the vLLM version from vllm-metal ${VLLM_METAL_VERSION}" >&2 + exit 1 + fi + echo "vllm-metal ${VLLM_METAL_VERSION} builds against vLLM ${VLLM_VERSION}" + + _vllm_src=$(mktemp -d) + trap 'rm -rf "${_vllm_src}"' EXIT + pushd "${_vllm_src}" + # 1) Build vLLM ${VLLM_VERSION} from the release source tarball against + # the CPU requirements. vllm-metal layers its MLX platform plugin on + # top of this exact build. + curl -fsSL -o "vllm-${VLLM_VERSION}.tar.gz" \ + "https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}.tar.gz" + tar -xzf "vllm-${VLLM_VERSION}.tar.gz" + pushd "vllm-${VLLM_VERSION}" + uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match + # -Wno-parentheses: clang on macOS treats one of vLLM's C++ warnings + # as an error without it (matches the upstream installer's CXXFLAGS). + CXXFLAGS="-Wno-parentheses" uv pip install . + popd + popd + + # 2) Install the prebuilt vllm-metal wheel for the PINNED release. It pulls + # mlx / mlx-metal as deps and registers the `metal` platform plugin that + # backend.py resolves to at engine-init time. Build the release-asset URL + # deterministically (tag + the cp312/arm64 wheel name) rather than querying + # api.github.com, whose unauthenticated rate limit (60/hr per IP) 403s on + # shared CI runners. The wheel version is the tag without its leading 'v'. + _metal_wheel="vllm_metal-${VLLM_METAL_VERSION#v}-cp312-cp312-macosx_11_0_arm64.whl" + _metal_wheel_url="https://github.com/vllm-project/vllm-metal/releases/download/${VLLM_METAL_VERSION}/${_metal_wheel}" + echo "Installing vllm-metal wheel: ${_metal_wheel_url}" + uv pip install "${_metal_wheel_url}" + + # Generate the gRPC stubs (backend_pb2*). installRequirements normally does + # this via runProtogen at the end; we skipped installRequirements on darwin, + # so call it explicitly here. + runProtogen + # Intel XPU has no upstream-published vllm wheels, so we always build vllm # from source against torch-xpu and replace the default triton with # triton-xpu (matching torch 2.11). Mirrors the upstream procedure: # https://github.com/vllm-project/vllm/blob/main/docs/getting_started/installation/gpu.xpu.inc.md -if [ "x${BUILD_TYPE}" == "xintel" ]; then +elif [ "x${BUILD_TYPE}" == "xintel" ]; then # Hide requirements-intel-after.txt so installRequirements doesn't # try `pip install vllm` (would either fail or grab a non-XPU wheel). _intel_after="${backend_dir}/requirements-intel-after.txt" diff --git a/backend/python/vllm/requirements-cublas13-after.txt b/backend/python/vllm/requirements-cublas13-after.txt index 62c486139..c04a25ab1 100644 --- a/backend/python/vllm/requirements-cublas13-after.txt +++ b/backend/python/vllm/requirements-cublas13-after.txt @@ -4,4 +4,7 @@ # instead — the cublas13 case in install.sh adds --index-strategy=unsafe-best-match # so uv consults this index alongside PyPI. --extra-index-url https://wheels.vllm.ai/0.23.0/cu130 +# VERSION COUPLING: darwin/Apple-Silicon builds use vllm-metal (see install.sh), +# which pins this exact vLLM version. Bumping vllm here means coordinating with a +# vllm-metal release that supports the new version, or macOS/Metal builds break. vllm==0.23.0 diff --git a/backend/python/vllm/test.py b/backend/python/vllm/test.py index 25a7f54e6..d00595f01 100644 --- a/backend/python/vllm/test.py +++ b/backend/python/vllm/test.py @@ -278,4 +278,261 @@ class TestBackendServicer(unittest.TestCase): print(err) self.fail("Embedding service failed") finally: - self.tearDown() \ No newline at end of file + self.tearDown() + + +class TestStreamingToolParser(unittest.TestCase): + """ + Server-less unit tests for the streaming + tool-parser machinery in + BackendServicer._predict. These tests instantiate BackendServicer + directly and mock the vLLM engine + tool parser, so they do not need + a GPU, a model, or a running gRPC server. Kept in a separate class to + avoid the parent setUp() which spawns a subprocess. + + Covers #582 (follow-up to #10346): + 1. Markup-leak prevention with a non-streaming parser (buffer fallback) + 2. No content duplication on the plain-text path with the buffer fallback + 3. Native streaming progressive plain-text emission + 4. Native streaming structured tool_call, no markup leak + 5. Parser exception → graceful fallback to buffer, still no markup + 6. No-tool-parser regression: unchanged per-delta content stream + """ + + @staticmethod + def _make_generate(chunks): + """Build a fake vLLM engine.generate that yields cumulative chunks.""" + from types import SimpleNamespace + async def gen(*a, **k): + for i, t in enumerate(chunks): + yield SimpleNamespace( + outputs=[SimpleNamespace( + text=t, + token_ids=list(range(i + 1)), + logprobs=None, + )], + prompt_token_ids=[0], + ) + return lambda *a, **k: gen() + + @staticmethod + def _collect(servicer, req): + import asyncio + async def run(): + return [r async for r in servicer._predict(req, None, streaming=True)] + return asyncio.run(run()) + + def _new_servicer(self): + import sys, os + sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + from backend import BackendServicer + s = BackendServicer() + s.reasoning_parser_cls = None + s.tool_parser_cls = None + s.tokenizer = None + return s + + # ── Case 1+2: parser without streaming method → buffer fallback ── + def test_buffer_path_no_markup_no_duplication(self): + from types import SimpleNamespace + + def parser_cls(called, content_text, calls): + class _P: + def __init__(self, tokenizer, tools=None): + pass + # NOTE: NO extract_tool_calls_streaming → takes the buffer path + def extract_tool_calls(self, c, request=None): + return SimpleNamespace( + tools_called=called, content=content_text, tool_calls=calls, + ) + return _P + + tools_json = '[{"type":"function","function":{"name":"calc","parameters":{}}}]' + + # Tool-call case: no raw markup in any delta.content + s = self._new_servicer() + s.llm = SimpleNamespace(generate=self._make_generate([ + '\n{"name": "calc"', + '\n{"name": "calc", "arguments": {"x": 1}}\n', + ])) + call = SimpleNamespace(id="call_1", + function=SimpleNamespace(name="calc", arguments='{"x": 1}')) + s.tool_parser_cls = parser_cls(True, "", [call]) + req = backend_pb2.PredictOptions(Prompt="x", Tools=tools_json) + replies = self._collect(s, req) + contents = [cd.content for r in replies for cd in r.chat_deltas if cd.content] + self.assertFalse( + any(" 0, + "Plain-text response not streamed progressively (native streaming inactive?)", + ) + assembled = "".join( + cd.content for r in replies for cd in r.chat_deltas if cd.content + ) + self.assertEqual( + assembled, "Paris is the capital of France.", + f"Assembled content wrong: {assembled!r}", + ) + + # ── Case 4: native streaming, structured tool_call, no markup ── + def test_native_streaming_tool_call_no_markup_leak(self): + from types import SimpleNamespace + + class _DeltaMsg: + def __init__(self, content=None, reasoning=None, tool_calls=None): + self.content = content + self.reasoning = reasoning + self.tool_calls = tool_calls or [] + + class _ToolCallStreamer: + def __init__(self, tokenizer, tools=None): + self._emitted = False + def extract_tool_calls(self, c, request=None): + raise AssertionError("extract_tool_calls invoked on native-streaming path") + def extract_tool_calls_streaming( + self, previous_text, current_text, delta_text, + previous_token_ids, current_token_ids, delta_token_ids, request, + ): + if "" in current_text and not self._emitted: + self._emitted = True + fn = SimpleNamespace(name="calc", arguments='{"x": 1}') + tc = SimpleNamespace(id="call_1", type="function", index=0, function=fn) + return _DeltaMsg(tool_calls=[tc]) + return None + + s = self._new_servicer() + s.llm = SimpleNamespace(generate=self._make_generate([ + '\n', + '\n{"name": "calc"', + '\n{"name": "calc", "arguments": {"x": 1}}\n', + ])) + s.tool_parser_cls = _ToolCallStreamer + req = backend_pb2.PredictOptions( + Prompt="x", + Tools='[{"type":"function","function":{"name":"calc","parameters":{}}}]', + ) + replies = self._collect(s, req) + + contents = [cd.content for r in replies for cd in r.chat_deltas if cd.content] + self.assertFalse( + any("" in c for c in contents), + f"markup leaked as content: {contents!r}", + ) + names = [tc.name for r in replies for cd in r.chat_deltas for tc in cd.tool_calls if tc.name] + args = [tc.arguments for r in replies for cd in r.chat_deltas for tc in cd.tool_calls if tc.arguments] + self.assertIn("calc", names, f"tool_call name missing; got {names!r}") + self.assertIn('{"x": 1}', args, f"tool_call args missing; got {args!r}") + + # ── Case 5: parser exception → fallback to buffer, no leak ── + def test_native_streaming_parser_exception_falls_back_to_buffer(self): + from types import SimpleNamespace + call = SimpleNamespace(id="call_1", + function=SimpleNamespace(name="calc", arguments='{"x": 1}')) + + class _BrokenStreamer: + def __init__(self, tokenizer, tools=None): + pass + def extract_tool_calls(self, c, request=None): + return SimpleNamespace(tools_called=True, content="", tool_calls=[call]) + def extract_tool_calls_streaming(self, *a, **kw): + raise RuntimeError("simulated parser bug") + + s = self._new_servicer() + s.llm = SimpleNamespace(generate=self._make_generate([ + '\n{"name": "calc"', + '\n{"name": "calc", "arguments": {"x": 1}}\n', + ])) + s.tool_parser_cls = _BrokenStreamer + req = backend_pb2.PredictOptions( + Prompt="x", + Tools='[{"type":"function","function":{"name":"calc","parameters":{}}}]', + ) + replies = self._collect(s, req) + + contents = [cd.content for r in replies for cd in r.chat_deltas if cd.content] + self.assertFalse( + any(" 0 { + xlog.Info("Reaped stale partial downloads", "count", removed) + } if options.GeneratedContentDir != "" { err := os.MkdirAll(options.GeneratedContentDir, 0o750) if err != nil { @@ -633,6 +644,12 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) { options.ForceEvictionWhenBusy = *settings.ForceEvictionWhenBusy } } + if settings.SizeAwareEviction != nil { + // Only apply if current value is default (false), suggesting it wasn't set from env var + if !options.SizeAwareEviction { + options.SizeAwareEviction = *settings.SizeAwareEviction + } + } if settings.LRUEvictionMaxRetries != nil { // Only apply if current value is default (30), suggesting it wasn't set from env var if options.LRUEvictionMaxRetries == 0 { @@ -733,6 +750,20 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) { options.MITMListen = *settings.MITMListen } + // Instance-wide default PII detectors. LOCALAI_PII_DEFAULT_DETECTORS (via + // WithPIIDefaultDetectors) wins when set; otherwise the file is the source + // — apply it only when the env/CLI left the value empty, mirroring the + // "env > file" precedence used for the other fields. This must land before + // startMITMIfConfigured (called right after this loader): the cloud-proxy + // listener resolves each intercept host's detectors once at start via + // ResolvePIIPolicy, and a MITM model that names no detectors of its own + // falls back to these defaults. Without it the listener (and request-side + // default redaction) starts with an empty detector set and forwards + // traffic unredacted even though pii_default_detectors is on disk. + if settings.PIIDefaultDetectors != nil && len(options.PIIDefaultDetectors) == 0 { + options.PIIDefaultDetectors = append([]string(nil), (*settings.PIIDefaultDetectors)...) + } + // Backend upgrade flags if settings.AutoUpgradeBackends != nil { if !options.AutoUpgradeBackends { @@ -836,6 +867,7 @@ func initializeWatchdog(application *Application, options *config.ApplicationCon model.WithLRULimit(lruLimit), model.WithMemoryReclaimer(options.MemoryReclaimerEnabled, options.MemoryReclaimerThreshold), model.WithForceEvictionWhenBusy(options.ForceEvictionWhenBusy), + model.WithSizeAwareEviction(options.SizeAwareEviction), ) application.ModelLoader().SetWatchDog(wd) diff --git a/core/application/watchdog.go b/core/application/watchdog.go index 9658b5114..c71871d87 100644 --- a/core/application/watchdog.go +++ b/core/application/watchdog.go @@ -90,6 +90,7 @@ func (a *Application) startWatchdog() error { model.WithLRULimit(lruLimit), model.WithMemoryReclaimer(appConfig.MemoryReclaimerEnabled, appConfig.MemoryReclaimerThreshold), model.WithForceEvictionWhenBusy(appConfig.ForceEvictionWhenBusy), + model.WithSizeAwareEviction(appConfig.SizeAwareEviction), ) // Create new stop channel BEFORE setting up any goroutines diff --git a/core/backend/options.go b/core/backend/options.go index d66b55049..d3ccb2f42 100644 --- a/core/backend/options.go +++ b/core/backend/options.go @@ -1,6 +1,7 @@ package backend import ( + "context" "encoding/json" "fmt" "math/rand/v2" @@ -12,7 +13,9 @@ import ( "github.com/mudler/LocalAI/core/config" "github.com/mudler/LocalAI/core/trace" pb "github.com/mudler/LocalAI/pkg/grpc/proto" + "github.com/mudler/LocalAI/pkg/downloader" "github.com/mudler/LocalAI/pkg/model" + "github.com/mudler/LocalAI/pkg/vram" "github.com/mudler/xlog" ) @@ -33,6 +36,67 @@ func recordModelLoadFailure(appConfig *config.ApplicationConfig, modelName, back }) } +// estimateModelSizeBytes uses the unified EstimateModel entry point to compute +// the total weight-file size for a model config. It collects all weight files +// from DownloadFiles, Model, and MMProj, and also extracts the HuggingFace +// repo ID so EstimateModel can fall back to the HF API when local file +// metadata is unavailable (e.g. not-yet-downloaded models). +func estimateModelSizeBytes(c config.ModelConfig, modelsPath string) int64 { + seen := make(map[string]bool) + input := vram.ModelEstimateInput{} + + addFile := func(uri string) { + if !vram.IsWeightFile(uri) { + return + } + resolved := uri + if !strings.Contains(uri, "://") { + resolved = "file://" + filepath.Join(modelsPath, uri) + } + if seen[resolved] { + return + } + seen[resolved] = true + input.Files = append(input.Files, vram.FileInput{URI: resolved}) + } + + // tryHFRepo resolves any huggingface:// or hf:// URI to an HTTPS URL and + // then extracts the org/model repo ID for use as the HF fallback path. + tryHFRepo := func(uri string) { + if input.HFRepo != "" { + return + } + resolved := downloader.URI(uri).ResolveURL() + if repoID, ok := vram.ExtractHFRepoID(resolved); ok { + input.HFRepo = repoID + } + } + + for _, f := range c.DownloadFiles { + uriStr := string(f.URI) + addFile(uriStr) + tryHFRepo(uriStr) + } + addFile(c.Model) + tryHFRepo(c.Model) + if c.MMProj != "" { + addFile(c.MMProj) + } + + if len(input.Files) == 0 && input.HFRepo == "" { + return 0 + } + + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + + result, err := vram.EstimateModelMultiContext(ctx, input, nil) + if err != nil || result.SizeBytes == 0 { + return 0 + } + return int64(result.SizeBytes) +} + func ModelOptions(c config.ModelConfig, so *config.ApplicationConfig, opts ...model.Option) []model.Option { defOpts := []model.Option{ model.WithBackendString(c.Backend), @@ -70,6 +134,10 @@ func ModelOptions(c config.ModelConfig, so *config.ApplicationConfig, opts ...mo defOpts = append(defOpts, model.WithExternalBackend(k, v)) } + if sizeBytes := estimateModelSizeBytes(c, so.SystemState.Model.ModelsPath); sizeBytes > 0 { + defOpts = append(defOpts, model.WithModelSizeBytes(sizeBytes)) + } + return append(defOpts, opts...) } @@ -90,10 +158,11 @@ func getSeed(c config.ModelConfig) int32 { // DefaultContextSize and DefaultBatchSize are the backend's fallbacks when a // model config leaves them unset. Exported so callers that must respect the // effective decode window — notably the router's prompt trimmer — resolve the -// same numbers grpcModelOpts does instead of guessing. +// same numbers grpcModelOpts does instead of guessing. The values are owned by +// core/config (single source of truth shared with the config default tiers). const ( - DefaultContextSize = 4096 - DefaultBatchSize = 512 + DefaultContextSize = config.DefaultContextSize + DefaultBatchSize = config.DefaultPhysicalBatch ) // EffectiveContextSize is the context window the backend will run with: the @@ -132,7 +201,7 @@ func grpcModelOpts(c config.ModelConfig, modelPath string) *pb.ModelOptions { ctxSize := EffectiveContextSize(c) b := EffectiveBatchSize(c) - flashAttention := "auto" + flashAttention := config.DefaultFlashAttention if c.FlashAttention != nil { flashAttention = *c.FlashAttention @@ -178,7 +247,7 @@ func grpcModelOpts(c config.ModelConfig, modelPath string) *pb.ModelOptions { mmlock = *c.MMlock } - nGPULayers := 9999999 + nGPULayers := config.DefaultNGPULayers if c.NGPULayers != nil { nGPULayers = *c.NGPULayers } diff --git a/core/backend/sound_classification.go b/core/backend/sound_classification.go new file mode 100644 index 000000000..666c32321 --- /dev/null +++ b/core/backend/sound_classification.go @@ -0,0 +1,88 @@ +package backend + +import ( + "context" + "fmt" + "sort" + + "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/schema" + + grpcPkg "github.com/mudler/LocalAI/pkg/grpc" + "github.com/mudler/LocalAI/pkg/grpc/proto" + "github.com/mudler/LocalAI/pkg/model" +) + +// SoundDetectionRequest carries the knobs the HTTP layer collects for an +// audio-tagging / sound-event-classification call. Audio is the path to the +// uploaded clip on disk; TopK and Threshold are optional (0 = backend default). +type SoundDetectionRequest struct { + Audio string + TopK int32 + Threshold float32 +} + +func (r *SoundDetectionRequest) toProto() *proto.SoundDetectionRequest { + return &proto.SoundDetectionRequest{ + Src: r.Audio, + TopK: r.TopK, + Threshold: r.Threshold, + } +} + +func loadSoundDetectionModel(ml *model.ModelLoader, modelConfig config.ModelConfig, appConfig *config.ApplicationConfig) (grpcPkg.Backend, error) { + if modelConfig.Backend == "" { + return nil, fmt.Errorf("sound classification: model %q has no backend set; supported backends include ced", modelConfig.Name) + } + opts := ModelOptions(modelConfig, appConfig) + m, err := ml.Load(opts...) + if err != nil { + recordModelLoadFailure(appConfig, modelConfig.Name, modelConfig.Backend, err, nil) + return nil, err + } + if m == nil { + return nil, fmt.Errorf("could not load sound classification model") + } + return m, nil +} + +// ModelSoundDetection runs the SoundDetection RPC against the configured +// backend and returns a normalized schema.SoundClassificationResult. +func ModelSoundDetection(ctx context.Context, req SoundDetectionRequest, ml *model.ModelLoader, modelConfig config.ModelConfig, appConfig *config.ApplicationConfig) (*schema.SoundClassificationResult, error) { + m, err := loadSoundDetectionModel(ml, modelConfig, appConfig) + if err != nil { + return nil, err + } + + r, err := m.SoundDetection(ctx, req.toProto()) + if err != nil { + return nil, err + } + return soundClassificationResultFromProto(modelConfig.Name, r), nil +} + +// soundClassificationResultFromProto maps the backend detections to the +// HTTP-facing schema, keeping the backend's score-descending order. +func soundClassificationResultFromProto(modelName string, r *proto.SoundDetectionResponse) *schema.SoundClassificationResult { + out := &schema.SoundClassificationResult{ + Model: modelName, + Detections: []schema.SoundClassification{}, + } + if r == nil { + return out + } + for _, d := range r.Detections { + if d == nil { + continue + } + out.Detections = append(out.Detections, schema.SoundClassification{ + Index: int(d.Index), + Label: d.Label, + Score: d.Score, + }) + } + sort.SliceStable(out.Detections, func(i, j int) bool { + return out.Detections[i].Score > out.Detections[j].Score + }) + return out +} diff --git a/core/cli/run.go b/core/cli/run.go index d011f3293..fd7ba8cd9 100644 --- a/core/cli/run.go +++ b/core/cli/run.go @@ -93,6 +93,7 @@ type RunCMD struct { EnableMemoryReclaimer bool `env:"LOCALAI_MEMORY_RECLAIMER,MEMORY_RECLAIMER,LOCALAI_GPU_RECLAIMER,GPU_RECLAIMER" default:"false" help:"Enable memory threshold monitoring to auto-evict backends when memory usage exceeds threshold (uses GPU VRAM if available, otherwise RAM)" group:"backends"` MemoryReclaimerThreshold float64 `env:"LOCALAI_MEMORY_RECLAIMER_THRESHOLD,MEMORY_RECLAIMER_THRESHOLD,LOCALAI_GPU_RECLAIMER_THRESHOLD,GPU_RECLAIMER_THRESHOLD" default:"0.95" help:"Memory usage threshold (0.0-1.0) that triggers backend eviction (default 0.95 = 95%%)" group:"backends"` ForceEvictionWhenBusy bool `env:"LOCALAI_FORCE_EVICTION_WHEN_BUSY,FORCE_EVICTION_WHEN_BUSY" default:"false" help:"Force eviction even when models have active API calls (default: false for safety)" group:"backends"` + SizeAwareEviction bool `env:"LOCALAI_SIZE_AWARE_EVICTION,SIZE_AWARE_EVICTION" default:"false" help:"Evict the largest loaded model first rather than the least-recently-used one, keeping small utility models resident and maximizing freed memory per eviction" group:"backends"` LRUEvictionMaxRetries int `env:"LOCALAI_LRU_EVICTION_MAX_RETRIES,LRU_EVICTION_MAX_RETRIES" default:"30" help:"Maximum number of retries when waiting for busy models to become idle before eviction (default: 30)" group:"backends"` LRUEvictionRetryInterval string `env:"LOCALAI_LRU_EVICTION_RETRY_INTERVAL,LRU_EVICTION_RETRY_INTERVAL" default:"1s" help:"Interval between retries when waiting for busy models to become idle (e.g., 1s, 2s) (default: 1s)" group:"backends"` Federated bool `env:"LOCALAI_FEDERATED,FEDERATED" help:"Enable federated instance" group:"federated"` @@ -139,7 +140,7 @@ type RunCMD struct { OIDCIssuer string `env:"LOCALAI_OIDC_ISSUER" help:"OIDC issuer URL for auto-discovery" group:"auth"` OIDCClientID string `env:"LOCALAI_OIDC_CLIENT_ID" help:"OIDC Client ID (auto-enables auth)" group:"auth"` OIDCClientSecret string `env:"LOCALAI_OIDC_CLIENT_SECRET" help:"OIDC Client Secret" group:"auth"` - AuthBaseURL string `env:"LOCALAI_BASE_URL" help:"Base URL for OAuth callbacks (e.g. http://localhost:8080)" group:"auth"` + ExternalBaseURL string `env:"LOCALAI_BASE_URL" help:"External base URL of this instance (e.g. https://localhost:8080). Used for OAuth callbacks and self-referential links (generated images/videos, job status). When unset, derived from X-Forwarded-Proto/Host or Forwarded headers." group:"api"` AuthAdminEmail string `env:"LOCALAI_ADMIN_EMAIL" help:"Email address to auto-promote to admin role" group:"auth"` AuthRegistrationMode string `env:"LOCALAI_REGISTRATION_MODE" default:"open" help:"Registration mode: 'open' (default), 'approval', or 'invite' (invite code required)" group:"auth"` DisableLocalAuth bool `env:"LOCALAI_DISABLE_LOCAL_AUTH" default:"false" help:"Disable local email/password registration and login (use with OAuth/OIDC-only setups)" group:"auth"` @@ -180,6 +181,8 @@ type RunCMD struct { // Cloud-proxy MITM listener (off by default). MITMListen string `env:"LOCALAI_MITM_LISTEN" help:"Address (host:port) for the cloudproxy MITM listener. Empty = disabled. Clients set HTTPS_PROXY=http://:. Intercept hosts are declared per-model via the model YAML mitm.hosts: block; create one from the Add Model UI." group:"middleware"` MITMCADir string `env:"LOCALAI_MITM_CA_DIR" type:"path" help:"Directory holding the MITM proxy CA cert + key. Defaults to /mitm-ca." group:"middleware"` + + PIIDefaultDetectors []string `env:"LOCALAI_PII_DEFAULT_DETECTORS" help:"Instance-wide default PII/secret detector model names applied to any PII-enabled model (chiefly cloud-proxy / MITM models) that names no pii.detectors of its own. Comma-separated, e.g. privacy-filter-nemotron,secret-filter. Takes precedence over the value persisted via the Middleware UI." group:"middleware"` } func (r *RunCMD) Run(ctx *cliContext.Context) error { @@ -242,6 +245,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error { config.WithAPIAddress(r.Address), config.WithMITMListen(r.MITMListen), config.WithMITMCADir(r.MITMCADir), + config.WithPIIDefaultDetectors(r.PIIDefaultDetectors), config.WithAgentJobRetentionDays(r.AgentJobRetentionDays), config.WithLlamaCPPTunnelCallback(func(tunnels []string) { tunnelEnvVar := strings.Join(tunnels, ",") @@ -499,9 +503,6 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error { opts = append(opts, config.WithAuthOIDCClientID(r.OIDCClientID)) opts = append(opts, config.WithAuthOIDCClientSecret(r.OIDCClientSecret)) } - if r.AuthBaseURL != "" { - opts = append(opts, config.WithAuthBaseURL(r.AuthBaseURL)) - } if r.AuthAdminEmail != "" { opts = append(opts, config.WithAuthAdminEmail(r.AuthAdminEmail)) } @@ -519,6 +520,12 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error { } } + // Applied unconditionally: the external base URL governs all self-referential + // links (not just OAuth callbacks), so it must take effect even when auth is off. + if r.ExternalBaseURL != "" { + opts = append(opts, config.WithExternalBaseURL(r.ExternalBaseURL)) + } + if idleWatchDog || busyWatchDog { opts = append(opts, config.EnableWatchDog) if idleWatchDog { @@ -564,6 +571,9 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error { if r.ForceEvictionWhenBusy { opts = append(opts, config.WithForceEvictionWhenBusy(true)) } + if r.SizeAwareEviction { + opts = append(opts, config.WithSizeAwareEviction(true)) + } if r.LRUEvictionMaxRetries > 0 { opts = append(opts, config.WithLRUEvictionMaxRetries(r.LRUEvictionMaxRetries)) } diff --git a/core/config/application_config.go b/core/config/application_config.go index 9ec0bdc33..1821a8441 100644 --- a/core/config/application_config.go +++ b/core/config/application_config.go @@ -49,6 +49,13 @@ type ApplicationConfig struct { P2PNetworkID string Federated bool + // ExternalBaseURL is the externally visible base URL of this instance + // (scheme+host[:port]), set via LOCALAI_BASE_URL. When non-empty it is + // authoritative for every self-referential URL LocalAI emits (OAuth + // callbacks, generated image/video links, async job StatusURLs), + // overriding proxy-header detection. Empty = derive from request headers. + ExternalBaseURL string + // DisableStats turns off per-request token tracking. By default the // routing module's billing recorder runs in every mode (including // no-auth single-user) so dashboards and `/api/usage` are immediately @@ -119,6 +126,7 @@ type ApplicationConfig struct { // Eviction settings ForceEvictionWhenBusy bool // Force eviction even when models have active API calls (default: false for safety) + SizeAwareEviction bool // Evict largest models first rather than least-recently-used (default: false) LRUEvictionMaxRetries int // Maximum number of retries when waiting for busy models to become idle (default: 30) LRUEvictionRetryInterval time.Duration // Interval between retries when waiting for busy models (default: 1s) @@ -195,7 +203,6 @@ type AuthConfig struct { OIDCIssuer string // OIDC issuer URL for auto-discovery (e.g. https://accounts.google.com) OIDCClientID string OIDCClientSecret string - BaseURL string // for OAuth callback URLs (e.g. "http://localhost:8080") AdminEmail string // auto-promote to admin on login RegistrationMode string // "open", "approval" (default when empty), "invite" DisableLocalAuth bool // disable local email/password registration and login @@ -488,6 +495,16 @@ func WithForceEvictionWhenBusy(enabled bool) AppOption { } } +// WithSizeAwareEviction enables size-aware eviction ordering. +// When true, the watchdog evicts the largest loaded model first rather than the +// least-recently-used one, keeping small utility models resident and maximizing +// memory freed per eviction. +func WithSizeAwareEviction(enabled bool) AppOption { + return func(o *ApplicationConfig) { + o.SizeAwareEviction = enabled + } +} + // WithLRUEvictionMaxRetries sets the maximum number of retries when waiting for busy models to become idle func WithLRUEvictionMaxRetries(maxRetries int) AppOption { return func(o *ApplicationConfig) { @@ -701,6 +718,18 @@ func WithMITMCADir(dir string) AppOption { } } +// WithPIIDefaultDetectors sets the instance-wide default PII/secret detector +// model names applied to any PII-enabled model (chiefly cloud-proxy / MITM +// models) that names no pii.detectors of its own. CLI/env: +// LOCALAI_PII_DEFAULT_DETECTORS. Empty leaves the value to +// runtime_settings.json / the Middleware UI; a non-empty value takes +// precedence over the file (env > file). +func WithPIIDefaultDetectors(detectors []string) AppOption { + return func(o *ApplicationConfig) { + o.PIIDefaultDetectors = detectors + } +} + func WithDynamicConfigDir(dynamicConfigsDir string) AppOption { return func(o *ApplicationConfig) { o.DynamicConfigsDir = dynamicConfigsDir @@ -927,9 +956,9 @@ func WithAuthGitHubClientSecret(clientSecret string) AppOption { } } -func WithAuthBaseURL(baseURL string) AppOption { +func WithExternalBaseURL(url string) AppOption { return func(o *ApplicationConfig) { - o.Auth.BaseURL = baseURL + o.ExternalBaseURL = url } } @@ -1028,6 +1057,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings { memoryReclaimerEnabled := o.MemoryReclaimerEnabled memoryReclaimerThreshold := o.MemoryReclaimerThreshold forceEvictionWhenBusy := o.ForceEvictionWhenBusy + sizeAwareEviction := o.SizeAwareEviction lruEvictionMaxRetries := o.LRUEvictionMaxRetries threads := o.Threads contextSize := o.ContextSize @@ -1120,6 +1150,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings { MemoryReclaimerEnabled: &memoryReclaimerEnabled, MemoryReclaimerThreshold: &memoryReclaimerThreshold, ForceEvictionWhenBusy: &forceEvictionWhenBusy, + SizeAwareEviction: &sizeAwareEviction, LRUEvictionMaxRetries: &lruEvictionMaxRetries, LRUEvictionRetryInterval: &lruEvictionRetryInterval, Threads: &threads, @@ -1244,6 +1275,10 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req o.ForceEvictionWhenBusy = *settings.ForceEvictionWhenBusy // This setting doesn't require restart, can be updated dynamically } + if settings.SizeAwareEviction != nil { + o.SizeAwareEviction = *settings.SizeAwareEviction + // This setting doesn't require restart, can be updated dynamically + } if settings.LRUEvictionMaxRetries != nil { o.LRUEvictionMaxRetries = *settings.LRUEvictionMaxRetries // This setting doesn't require restart, can be updated dynamically diff --git a/core/config/backend_capabilities.go b/core/config/backend_capabilities.go index eba8c3c37..cc9567887 100644 --- a/core/config/backend_capabilities.go +++ b/core/config/backend_capabilities.go @@ -8,27 +8,28 @@ import ( // Usecase name constants — the canonical string values used in gallery entries, // model configs (known_usecases), and UsecaseInfoMap keys. const ( - UsecaseChat = "chat" - UsecaseCompletion = "completion" - UsecaseEdit = "edit" - UsecaseVision = "vision" - UsecaseEmbeddings = "embeddings" - UsecaseTokenize = "tokenize" - UsecaseImage = "image" - UsecaseVideo = "video" - UsecaseTranscript = "transcript" - UsecaseTTS = "tts" - UsecaseSoundGeneration = "sound_generation" - UsecaseRerank = "rerank" - UsecaseDetection = "detection" - UsecaseDepth = "depth" - UsecaseVAD = "vad" - UsecaseAudioTransform = "audio_transform" - UsecaseDiarization = "diarization" - UsecaseRealtimeAudio = "realtime_audio" - UsecaseFaceRecognition = "face_recognition" - UsecaseSpeakerRecognition = "speaker_recognition" - UsecaseTokenClassify = "token_classify" + UsecaseChat = "chat" + UsecaseCompletion = "completion" + UsecaseEdit = "edit" + UsecaseVision = "vision" + UsecaseEmbeddings = "embeddings" + UsecaseTokenize = "tokenize" + UsecaseImage = "image" + UsecaseVideo = "video" + UsecaseTranscript = "transcript" + UsecaseTTS = "tts" + UsecaseSoundGeneration = "sound_generation" + UsecaseRerank = "rerank" + UsecaseDetection = "detection" + UsecaseDepth = "depth" + UsecaseVAD = "vad" + UsecaseAudioTransform = "audio_transform" + UsecaseDiarization = "diarization" + UsecaseSoundClassification = "sound_classification" + UsecaseRealtimeAudio = "realtime_audio" + UsecaseFaceRecognition = "face_recognition" + UsecaseSpeakerRecognition = "speaker_recognition" + UsecaseTokenClassify = "token_classify" ) // GRPCMethod identifies a Backend service RPC from backend.proto. @@ -51,6 +52,7 @@ const ( MethodVAD GRPCMethod = "VAD" MethodAudioTransform GRPCMethod = "AudioTransform" MethodDiarize GRPCMethod = "Diarize" + MethodSoundDetection GRPCMethod = "SoundDetection" MethodAudioToAudioStream GRPCMethod = "AudioToAudioStream" MethodFaceVerify GRPCMethod = "FaceVerify" MethodFaceAnalyze GRPCMethod = "FaceAnalyze" @@ -165,6 +167,11 @@ var UsecaseInfoMap = map[string]UsecaseInfo{ GRPCMethod: MethodDiarize, Description: "Speaker diarization (who-spoke-when, per-speaker segments) via the Diarize RPC.", }, + UsecaseSoundClassification: { + Flag: FLAG_SOUND_CLASSIFICATION, + GRPCMethod: MethodSoundDetection, + Description: "Sound-event classification / audio tagging (scored AudioSet labels like baby cry, glass breaking, alarms) via the SoundDetection RPC.", + }, UsecaseRealtimeAudio: { Flag: FLAG_REALTIME_AUDIO, GRPCMethod: MethodAudioToAudioStream, diff --git a/core/config/defaults.go b/core/config/defaults.go new file mode 100644 index 000000000..18625fab3 --- /dev/null +++ b/core/config/defaults.go @@ -0,0 +1,30 @@ +package config + +// Canonical default values. +// +// These are owned here so the two layers that need them share a single source +// of truth: the config tiers (ApplyInference/Hardware/Serving/Generic — which +// *decide* defaults) and core/backend/options.go (which *translates* a +// ModelConfig to the backend wire format and supplies the same fallbacks +// defensively). Previously these were duplicated as literals across both +// packages and had drifted (e.g. n_gpu_layers 9999999 vs 99999999, two batch +// constants of 512). core/backend imports core/config, so backend references +// these; config never imports backend. +const ( + // DefaultContextSize is the fallback context window when none is configured + // or estimable from the model. + DefaultContextSize = 4096 + + // GGUFFallbackContextSize is the context window for a GGUF model whose + // metadata yields no usable estimate (see guessGGUFFromFile). Deliberately + // smaller than DefaultContextSize to stay conservative on memory there. + GGUFFallbackContextSize = 1024 + + // DefaultNGPULayers means "offload all layers"; the backend (fit_params) + // clamps to what actually fits in device memory. + DefaultNGPULayers = 99999999 + + // DefaultFlashAttention is the flash-attention mode default; "auto" lets the + // backend enable it when the model + backend support it. + DefaultFlashAttention = "auto" +) diff --git a/core/config/generic_defaults.go b/core/config/generic_defaults.go new file mode 100644 index 000000000..57cfba514 --- /dev/null +++ b/core/config/generic_defaults.go @@ -0,0 +1,115 @@ +package config + +import "os" + +// ApplyGenericDefaults fills the generic fallback values applied after the +// higher-priority tiers (ApplyInferenceDefaults for the model family, +// ApplyHardwareDefaults for the device, ApplyServingDefaults for serving +// policy): sampling parameters and a few runtime flags. Like the other tiers it +// only fills values still left unset, so model-family / explicit config wins. +func ApplyGenericDefaults(cfg *ModelConfig) { + if cfg == nil { + return + } + + // https://github.com/ggerganov/llama.cpp/blob/75cd4c77292034ecec587ecb401366f57338f7c0/common/sampling.h#L22 + defaultTopP := 0.95 + defaultTopK := 40 + defaultMinP := 0.0 + defaultTemp := 0.9 + // https://github.com/mudler/LocalAI/issues/2780 + defaultMirostat := 0 + defaultMirostatTAU := 5.0 + defaultMirostatETA := 0.1 + defaultTypicalP := 1.0 + defaultTFZ := 1.0 + defaultZero := 0 + + trueV := true + falseV := false + + if cfg.Seed == nil { + // random number generator seed + defaultSeed := RAND_SEED + cfg.Seed = &defaultSeed + } + + // top_k=40 is llama.cpp's sampling default and is wrong for backends whose + // native default differs (issue #6632). Only inject it for the llama.cpp + // family and the empty/auto backend; leave TopK nil for known non-llama + // backends (e.g. mlx, whose intended default is top_k=0) so the wire value + // is 0 rather than a silently-changed 40. + if cfg.TopK == nil && UsesLlamaSamplerDefaults(cfg.Backend) { + cfg.TopK = &defaultTopK + } + + if cfg.MinP == nil { + cfg.MinP = &defaultMinP + } + + if cfg.TypicalP == nil { + cfg.TypicalP = &defaultTypicalP + } + + if cfg.TFZ == nil { + cfg.TFZ = &defaultTFZ + } + + if cfg.MMap == nil { + // MMap is enabled by default + + // Only exception is for Intel GPUs + if os.Getenv("XPU") != "" { + cfg.MMap = &falseV + } else { + cfg.MMap = &trueV + } + } + + if cfg.MMlock == nil { + // MMlock is disabled by default + cfg.MMlock = &falseV + } + + if cfg.TopP == nil { + cfg.TopP = &defaultTopP + } + if cfg.Temperature == nil { + cfg.Temperature = &defaultTemp + } + + if cfg.Maxtokens == nil { + cfg.Maxtokens = &defaultZero + } + + if cfg.Mirostat == nil { + cfg.Mirostat = &defaultMirostat + } + + if cfg.MirostatETA == nil { + cfg.MirostatETA = &defaultMirostatETA + } + + if cfg.MirostatTAU == nil { + cfg.MirostatTAU = &defaultMirostatTAU + } + + if cfg.LowVRAM == nil { + cfg.LowVRAM = &falseV + } + + if cfg.Embeddings == nil { + cfg.Embeddings = &falseV + } + + if cfg.Reranking == nil { + cfg.Reranking = &falseV + } + + if cfg.PromptCacheAll == nil { + // Match upstream llama.cpp's default (common/common.h: cache_prompt = true) + // and let cache_idle_slots / kv_unified actually do useful work; users can + // opt out with an explicit `prompt_cache_all: false` in the model YAML. + cfg.PromptCacheAll = &trueV + } +} diff --git a/core/config/generic_defaults_test.go b/core/config/generic_defaults_test.go new file mode 100644 index 000000000..7cb080c0b --- /dev/null +++ b/core/config/generic_defaults_test.go @@ -0,0 +1,36 @@ +package config_test + +import ( + . "github.com/mudler/LocalAI/core/config" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("ApplyGenericDefaults (generic fallback tier)", func() { + It("fills sampling + runtime fallbacks when unset", func() { + cfg := &ModelConfig{} // empty backend uses the llama sampler defaults + ApplyGenericDefaults(cfg) + Expect(cfg.TopP).ToNot(BeNil()) + Expect(*cfg.TopP).To(Equal(0.95)) + Expect(*cfg.TopK).To(Equal(40)) + Expect(*cfg.Temperature).To(Equal(0.9)) + Expect(*cfg.MMap).To(BeTrue()) + Expect(*cfg.MMlock).To(BeFalse()) + Expect(*cfg.PromptCacheAll).To(BeTrue()) + }) + + It("never overrides explicit values", func() { + tk := 7 + tp := 0.5 + cfg := &ModelConfig{} + cfg.TopK = &tk + cfg.TopP = &tp + ApplyGenericDefaults(cfg) + Expect(*cfg.TopK).To(Equal(7)) + Expect(*cfg.TopP).To(Equal(0.5)) + }) + + It("no-ops on nil", func() { + Expect(func() { ApplyGenericDefaults(nil) }).ToNot(Panic()) + }) +}) diff --git a/core/config/gguf.go b/core/config/gguf.go index 5e04f5693..16e43c914 100644 --- a/core/config/gguf.go +++ b/core/config/gguf.go @@ -14,11 +14,6 @@ import ( "github.com/gpustack/gguf-parser-go/util/ptr" ) -const ( - defaultContextSize = 1024 - defaultNGPULayers = 99999999 -) - // reservedNonChatModel reports whether the operator reserved this model for an // internal primitive — the router score classifier or the PII NER // token_classify tier. Such a model has no chat template and must not be @@ -38,7 +33,7 @@ func guessGGUFFromFile(cfg *ModelConfig, f *gguf.GGUFFile, defaultCtx int) { cSize := int(ctxSize) cfg.ContextSize = &cSize } else { - defaultCtx = defaultContextSize + defaultCtx = GGUFFallbackContextSize cfg.ContextSize = &defaultCtx } } @@ -52,7 +47,7 @@ func guessGGUFFromFile(cfg *ModelConfig, f *gguf.GGUFFile, defaultCtx int) { if cfg.NGPULayers == nil { // we assume we want to offload all layers - defaultHigh := defaultNGPULayers + defaultHigh := DefaultNGPULayers cfg.NGPULayers = &defaultHigh } diff --git a/core/config/hardware_defaults.go b/core/config/hardware_defaults.go new file mode 100644 index 000000000..81bc9fc7f --- /dev/null +++ b/core/config/hardware_defaults.go @@ -0,0 +1,315 @@ +package config + +import ( + "fmt" + "os" + "strconv" + "strings" + + "github.com/mudler/LocalAI/pkg/xsysinfo" + "github.com/mudler/xlog" +) + +// HardwareDefaultsDisabled reports whether hardware auto-tuning is turned off via +// LOCALAI_DISABLE_HARDWARE_DEFAULTS=true (mirrors LOCALAI_DISABLE_GUESSING). When +// set, ApplyHardwareDefaults and the distributed router's node tuning are +// skipped entirely, so the backend runs llama.cpp's stock batch/parallel +// behavior — an escape hatch for users who want predictable, un-tuned defaults. +func HardwareDefaultsDisabled() bool { + // Read directly like the sibling LOCALAI_DISABLE_GUESSING toggle in + // hooks_llamacpp.go: these config-layer heuristic switches run deep in the + // defaults pipeline with no ApplicationConfig in scope to plumb through. + //nolint:forbidigo // config-layer heuristic toggle, mirrors LOCALAI_DISABLE_GUESSING + return os.Getenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS") == "true" +} + +// Hardware-driven model-config defaults. +// +// This sits alongside the other config overriders (ApplyInferenceDefaults for +// model families, guessDefaultsFromFile for GGUF/NGPULayers): they all +// heuristically fill ModelConfig values the user left unset. Hardware tuning is +// the same domain — "adjust the config from the device that will run it" — so +// it lives here rather than scattered into the backend or a separate package. +// +// The heuristics are parameterized on a GPU descriptor (not on direct +// detection) so they apply in both deployment shapes: SetDefaults passes the +// LocalGPU on a single host, and the distributed router passes the *selected +// node's* reported GPU before loading there (the frontend that loaded the +// config may have no GPU at all). + +// GPU describes the device that will run a model. +type GPU struct { + // Vendor is "nvidia", "amd", … (matches xsysinfo vendor constants). + Vendor string + // ComputeCapability is the NVIDIA compute capability as "major.minor" + // (e.g. "12.1" for GB10 / DGX Spark). Empty for non-NVIDIA / unknown. + ComputeCapability string + // VRAM is total device memory in bytes (0 = unknown). + VRAM uint64 +} + +// Physical batch (n_batch / n_ubatch) defaults. +const ( + // DefaultPhysicalBatch is the conservative default when no hardware-specific + // tuning applies. core/backend.DefaultBatchSize references this (single source). + DefaultPhysicalBatch = 512 + // BlackwellPhysicalBatch is the default on NVIDIA Blackwell consumer GPUs + // (sm_12x: sm_120 RTX 50-series, sm_121 GB10 / DGX Spark). A larger physical + // batch materially lifts MoE prefill there (per-expert GEMM tiles fill + // better); measured on a GB10 with Qwen3-30B-A3B to saturate around 2048. + BlackwellPhysicalBatch = 2048 +) + +// IsNVIDIABlackwell reports whether the GPU is in the NVIDIA Blackwell consumer +// family (sm_12x). Datacenter Blackwell (B100/B200/GB200, sm_100 / cc 10.0) +// reports a different compute capability and is intentionally not matched. +func (g GPU) IsNVIDIABlackwell() bool { + maj, _ := parseComputeCapability(g.ComputeCapability) + return maj >= 12 +} + +// Compute-buffer headroom guard for the raised physical batch. +// +// Raising n_ubatch grows the CUDA *compute buffer* (the scratch for the forward +// graph), which is allocated PER DEVICE — it does not benefit from a second GPU +// the way weights or KV (which are split across devices) do. The buffer scales +// ~linearly with n_ubatch * n_ctx, so a large context turns the GB10-tuned +// ub2048 into multi-GiB of extra scratch that must fit on a SINGLE card. On a +// 16 GiB consumer Blackwell with a 200k context that overflows (issue #10485), +// even though the GB10 it was measured on (128 GiB unified memory) had room. +// +// These constants size a conservative guard: only raise the batch when the +// extra scratch fits the per-device VRAM ceiling. +const ( + // computeBufferBytesPerCell approximates the CUDA compute-buffer cost of one + // (n_ubatch * n_ctx) cell. Derived from an observed allocation (ub2048 * + // ctx204800 ~= 4.5 GiB => ~11 B/cell) and rounded up to 16 for margin, since + // the real cost also grows with model width (heads / embedding dim) which we + // don't know at config time. + computeBufferBytesPerCell = 16 + // blackwellBatchHeadroomDivisor caps the extra compute buffer from raising the + // physical batch at VRAM/divisor. /4 keeps the bulk of a device for weights + + // KV, which already dominate VRAM use. + blackwellBatchHeadroomDivisor = 4 +) + +// PhysicalBatch returns the canonical physical batch (n_batch/n_ubatch) for the +// given hardware class, ignoring context/VRAM headroom. Use +// PhysicalBatchForContext when a model context and per-device VRAM are known +// (the load paths) so the raised batch can't overflow a single device. +func PhysicalBatch(g GPU) int { + if g.IsNVIDIABlackwell() { + return BlackwellPhysicalBatch + } + return DefaultPhysicalBatch +} + +// PhysicalBatchForContext is PhysicalBatch gated on per-device VRAM headroom for +// the given context: it only raises the batch above the conservative default +// when the extra compute buffer (which is allocated on a single device and grows +// with n_ubatch * n_ctx) fits within blackwellBatchHeadroomDivisor of the GPU's +// VRAM. g.VRAM must be the PER-DEVICE ceiling (the smallest device on a +// multi-GPU host), not the summed total — the compute buffer can't be split. +// +// VRAM 0 (unknown) stays conservative rather than risk a per-device OOM; the +// GB10 / unified-memory path reports system RAM, so it still clears the guard. +func PhysicalBatchForContext(g GPU, ctx int) int { + if !g.IsNVIDIABlackwell() { + return DefaultPhysicalBatch + } + if g.VRAM == 0 { + return DefaultPhysicalBatch + } + if largeContextForDevice(g, ctx) { + return DefaultPhysicalBatch + } + return BlackwellPhysicalBatch +} + +// largeContextForDevice reports whether the given context is large relative to +// the per-device VRAM ceiling — the shared "tight single-model fit" signal that +// suppresses BOTH throughput-oriented defaults (the Blackwell batch boost and +// the concurrency slot count). It sizes the extra compute-buffer scratch a +// raised batch would need at this context (which grows ~n_ubatch * n_ctx and +// is allocated per device) and asks whether it overflows a fraction of the +// device VRAM; when it does, the device has no headroom to spend on throughput +// and the conservative defaults must hold (issue #10485). +// +// g.VRAM must be the PER-DEVICE ceiling (the smallest device on a multi-GPU +// host). VRAM 0 (unknown) is treated as not-large so detection gaps don't +// silently disable the defaults. +func largeContextForDevice(g GPU, ctx int) bool { + if g.VRAM == 0 { + return false + } + if ctx <= 0 { + ctx = DefaultContextSize + } + extra := uint64(ctx) * uint64(BlackwellPhysicalBatch-DefaultPhysicalBatch) * computeBufferBytesPerCell + return extra > g.VRAM/blackwellBatchHeadroomDivisor +} + +// IsManagedPhysicalBatch reports whether n is a value PhysicalBatch assigns. +// Callers that re-tune a value chosen by an upstream host (the distributed +// router correcting the frontend's guess) use this to avoid clobbering an +// explicit user batch such as 1024. +func IsManagedPhysicalBatch(n int) bool { + return n == DefaultPhysicalBatch || n == BlackwellPhysicalBatch +} + +// Parallel-slot (n_parallel) VRAM tiers. llama.cpp serializes requests at +// n_parallel=1 (the backend default) and only auto-enables continuous batching +// when n_parallel > 1 — so a single-slot default makes concurrent requests +// queue. We default a slot count by GPU size so multi-user serving works out of +// the box. With the backend's unified KV cache the slots SHARE the context +// budget, so more slots add concurrency without multiplying KV memory. +const ( + parallelSlotsVRAMHigh = uint64(32) << 30 // >=32 GiB -> 8 slots + parallelSlotsVRAMMid = uint64(8) << 30 // >=8 GiB -> 4 slots + parallelSlotsVRAMLow = uint64(4) << 30 // >=4 GiB -> 2 slots +) + +// DefaultParallelSlots returns the n_parallel default for the given GPU. Returns +// 1 (no concurrency) when VRAM is unknown or too small, so we never change +// behavior on CPU-only / tiny devices. +func DefaultParallelSlots(g GPU) int { + switch { + case g.VRAM >= parallelSlotsVRAMHigh: + return 8 + case g.VRAM >= parallelSlotsVRAMMid: + return 4 + case g.VRAM >= parallelSlotsVRAMLow: + return 2 + default: + return 1 + } +} + +// ParallelSlotsForContext is DefaultParallelSlots gated on per-device VRAM +// headroom for the given context. A large context already claims most of a +// single device's VRAM (the KV cache plus the per-slot compute/checkpoint +// scratch that scales with n_seq_max), so defaulting multiple slots there +// pushes a tight single-model fit into per-device CUDA OOM (issue #10485): the +// model loads but the final allocation (e.g. an MTP draft context's KV cache) +// overflows the tighter card by a few hundred MiB. Returns 1 (no concurrency) +// in that tight regime, otherwise the VRAM-scaled DefaultParallelSlots. +// +// g.VRAM must be the PER-DEVICE ceiling (smallest device on a multi-GPU host). +// It shares largeContextForDevice with the batch boost so both throughput +// defaults are suppressed together; the GB10 / unified-memory path reports +// system RAM and so keeps full concurrency even at large contexts. +func ParallelSlotsForContext(g GPU, ctx int) int { + slots := DefaultParallelSlots(g) + if slots <= 1 || g.VRAM == 0 { + return slots + } + if largeContextForDevice(g, ctx) { + return 1 + } + return slots +} + +// EnsureParallelOptionForContext appends a VRAM-scaled "parallel:N" backend +// option when the model doesn't already set one and the GPU warrants (and has +// headroom for) concurrency at this context. Returns the possibly-extended +// options. Shared by the single-host config path (ApplyHardwareDefaults) and +// the distributed router (per selected node). +func EnsureParallelOptionForContext(opts []string, gpu GPU, ctx int) []string { + if slots := ParallelSlotsForContext(gpu, ctx); slots > 1 && !hasParallelOption(opts) { + return append(opts, fmt.Sprintf("parallel:%d", slots)) + } + return opts +} + +// EnsureParallelOption is EnsureParallelOptionForContext with no known context +// (defaults to DefaultContextSize, which clears the headroom gate on any device +// large enough to warrant concurrency). Kept for callers without a model +// context. +func EnsureParallelOption(opts []string, gpu GPU) []string { + return EnsureParallelOptionForContext(opts, gpu, 0) +} + +// hasParallelOption reports whether the model already sets parallel/n_parallel +// so we never override an explicit value (helper shared with serving_defaults.go). +func hasParallelOption(opts []string) bool { + return backendOptionSet(opts, "parallel", "n_parallel") +} + +// localGPU builds a GPU descriptor from local detection, used by SetDefaults on +// a single host (the distributed router builds it from the selected node's +// reported info instead). It is a package var so tests can inject a +// deterministic device — detection does a live nvidia-smi call. +var localGPU = func() GPU { + vendor, _ := xsysinfo.DetectGPUVendor() + // Use the SMALLEST device's VRAM, not the summed total: the parallel-slot + // tier and the batch headroom guard both reason about what fits on a single + // card, and per-device compute buffers can't be split across GPUs. Summing + // two 16 GiB cards into "32 GiB" is what over-provisioned multi-GPU hosts + // into OOM (issue #10485). + vram, _ := xsysinfo.MinPerGPUVRAM() + return GPU{ + Vendor: vendor, + ComputeCapability: xsysinfo.NVIDIAComputeCapability(), + VRAM: vram, + } +} + +// ApplyHardwareDefaults fills ModelConfig values that depend on the target GPU +// and were left unset by the user. Currently: a larger physical batch on +// Blackwell. Explicit config always wins (we only touch zero values). +func ApplyHardwareDefaults(cfg *ModelConfig, gpu GPU) { + if cfg == nil || HardwareDefaultsDisabled() { + return + } + // Raise the physical batch on Blackwell only when the resulting compute + // buffer fits the per-device VRAM at THIS model's context. Leaving Batch at 0 + // (rather than writing the default 512) preserves the downstream single-pass + // sizing in core/backend.EffectiveBatchSize for embedding/score/rerank. + ctx := DefaultContextSize + if cfg.ContextSize != nil { + ctx = *cfg.ContextSize + } + if cfg.Batch == 0 { + if PhysicalBatchForContext(gpu, ctx) == BlackwellPhysicalBatch { + cfg.Batch = BlackwellPhysicalBatch + xlog.Debug("[hardware_defaults] Blackwell GPU: defaulting physical batch", + "batch", cfg.Batch, "compute_cap", gpu.ComputeCapability, "context", ctx, "vram_gib", gpu.VRAM>>30) + } + } + + // Enable concurrent serving by default on a capable GPU: without this the + // llama.cpp backend runs n_parallel=1 and serializes multi-user requests + // (continuous batching stays off). Unified KV means the slots share the + // context budget, but a context large enough to fill a single device leaves + // no room for the per-slot scratch, so the slot count is gated on per-device + // headroom too (issue #10485). Explicit parallel/n_parallel always wins. + if before := len(cfg.Options); true { + cfg.Options = EnsureParallelOptionForContext(cfg.Options, gpu, ctx) + if len(cfg.Options) > before { + xlog.Debug("[hardware_defaults] defaulting parallel slots for concurrent serving", + "option", cfg.Options[len(cfg.Options)-1], "context", ctx, "vram_gib", gpu.VRAM>>30) + } + } +} + +// parseComputeCapability splits a "major.minor" string into integer parts. +// Returns (-1, -1) when it can't be parsed. +func parseComputeCapability(cc string) (int, int) { + cc = strings.TrimSpace(cc) + if cc == "" { + return -1, -1 + } + majStr, minStr := cc, "0" + if dot := strings.IndexByte(cc, '.'); dot >= 0 { + majStr, minStr = cc[:dot], cc[dot+1:] + } + maj, err := strconv.Atoi(strings.TrimSpace(majStr)) + if err != nil { + return -1, -1 + } + min, err := strconv.Atoi(strings.TrimSpace(minStr)) + if err != nil { + min = 0 + } + return maj, min +} diff --git a/core/config/hardware_defaults_internal_test.go b/core/config/hardware_defaults_internal_test.go new file mode 100644 index 000000000..d6878c86e --- /dev/null +++ b/core/config/hardware_defaults_internal_test.go @@ -0,0 +1,48 @@ +package config + +import ( + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +// Single-instance path: SetDefaults applies hardware defaults from the local +// GPU. The detection seam (localGPU) is injected so the path is deterministic +// without a real GPU. +var _ = Describe("SetDefaults hardware defaults (single-instance)", func() { + const gib = uint64(1) << 30 + + var orig func() GPU + BeforeEach(func() { orig = localGPU }) + AfterEach(func() { localGPU = orig }) + + It("sets the physical batch on a local Blackwell GPU with headroom", func() { + localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} } + cfg := &ModelConfig{} + cfg.SetDefaults() + Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch)) + }) + + It("leaves batch unset when a large context would overflow the device", func() { + // Regression guard for issue #10485: 16 GiB consumer Blackwell + ~200k ctx. + localGPU = func() GPU { return GPU{ComputeCapability: "12.0", VRAM: 16 * gib} } + ctx := 204800 + cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}} + cfg.SetDefaults() + Expect(cfg.Batch).To(Equal(0)) + }) + + It("leaves batch unset on a non-Blackwell local GPU", func() { + localGPU = func() GPU { return GPU{ComputeCapability: "8.9", VRAM: 119 * gib} } + cfg := &ModelConfig{} + cfg.SetDefaults() + Expect(cfg.Batch).To(Equal(0)) + }) + + It("never overrides an explicit batch", func() { + localGPU = func() GPU { return GPU{ComputeCapability: "12.1", VRAM: 119 * gib} } + cfg := &ModelConfig{} + cfg.Batch = 1024 + cfg.SetDefaults() + Expect(cfg.Batch).To(Equal(1024)) + }) +}) diff --git a/core/config/hardware_defaults_test.go b/core/config/hardware_defaults_test.go new file mode 100644 index 000000000..452a5a884 --- /dev/null +++ b/core/config/hardware_defaults_test.go @@ -0,0 +1,173 @@ +package config_test + +import ( + . "github.com/mudler/LocalAI/core/config" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("Hardware-driven config defaults", func() { + const gib = uint64(1) << 30 + + DescribeTable("GPU.IsNVIDIABlackwell (sm_12x consumer family)", + func(cc string, want bool) { + Expect(GPU{ComputeCapability: cc}.IsNVIDIABlackwell()).To(Equal(want)) + }, + Entry("GB10 12.1", "12.1", true), + Entry("RTX 50 12.0", "12.0", true), + Entry("future 13.0", "13.0", true), + Entry("Hopper 9.0", "9.0", false), + Entry("Ada 8.9", "8.9", false), + Entry("datacenter Blackwell sm_100 10.0", "10.0", false), + Entry("unknown", "", false), + ) + + Describe("PhysicalBatch / IsManagedPhysicalBatch", func() { + It("returns the Blackwell batch on Blackwell", func() { + Expect(PhysicalBatch(GPU{ComputeCapability: "12.1"})).To(Equal(BlackwellPhysicalBatch)) + }) + It("returns the default batch otherwise", func() { + Expect(PhysicalBatch(GPU{ComputeCapability: "9.0"})).To(Equal(DefaultPhysicalBatch)) + Expect(PhysicalBatch(GPU{})).To(Equal(DefaultPhysicalBatch)) + }) + It("recognizes managed defaults but not explicit values", func() { + Expect(IsManagedPhysicalBatch(DefaultPhysicalBatch)).To(BeTrue()) + Expect(IsManagedPhysicalBatch(BlackwellPhysicalBatch)).To(BeTrue()) + Expect(IsManagedPhysicalBatch(1024)).To(BeFalse()) + }) + }) + + Describe("PhysicalBatchForContext (per-device VRAM headroom)", func() { + It("raises the batch when the compute buffer fits the device", func() { + // 16 GiB Blackwell with a small context: the extra scratch is tiny. + Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 8192)). + To(Equal(BlackwellPhysicalBatch)) + }) + It("keeps the default batch when a large context would overflow one device", func() { + // The issue #10485 case: 16 GiB consumer Blackwell, ~200k context. + Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.0", VRAM: 16 * gib}, 204800)). + To(Equal(DefaultPhysicalBatch)) + }) + It("still raises the batch on a large unified-memory device (GB10)", func() { + // GB10 reports system RAM (~119 GiB) as its single device's VRAM. + Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1", VRAM: 119 * gib}, 204800)). + To(Equal(BlackwellPhysicalBatch)) + }) + It("stays conservative when VRAM is unknown", func() { + Expect(PhysicalBatchForContext(GPU{ComputeCapability: "12.1"}, 8192)). + To(Equal(DefaultPhysicalBatch)) + }) + It("never raises the batch on non-Blackwell", func() { + Expect(PhysicalBatchForContext(GPU{ComputeCapability: "9.0", VRAM: 80 * gib}, 8192)). + To(Equal(DefaultPhysicalBatch)) + }) + }) + + Describe("ApplyHardwareDefaults", func() { + It("raises an unset batch to 2048 on Blackwell with headroom", func() { + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib}) + Expect(cfg.Batch).To(Equal(BlackwellPhysicalBatch)) + }) + It("leaves batch unset when a large context would overflow one device", func() { + // Regression guard for issue #10485: 16 GiB card + ~200k context. + ctx := 204800 + cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib}) + Expect(cfg.Batch).To(Equal(0)) + }) + It("leaves batch unset on non-Blackwell", func() { + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "9.0", VRAM: 119 * gib}) + Expect(cfg.Batch).To(Equal(0)) + }) + It("never overrides an explicit batch", func() { + cfg := &ModelConfig{} + cfg.Batch = 1024 + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib}) + Expect(cfg.Batch).To(Equal(1024)) + }) + It("no-ops on nil", func() { + Expect(func() { ApplyHardwareDefaults(nil, GPU{ComputeCapability: "12.1"}) }).ToNot(Panic()) + }) + + It("applies nothing when hardware defaults are disabled via env", func() { + GinkgoT().Setenv("LOCALAI_DISABLE_HARDWARE_DEFAULTS", "true") + Expect(HardwareDefaultsDisabled()).To(BeTrue()) + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib}) + Expect(cfg.Batch).To(Equal(0)) + Expect(cfg.Options).To(BeEmpty()) + }) + }) + + DescribeTable("DefaultParallelSlots (by VRAM)", + func(vramGiB uint64, want int) { + Expect(DefaultParallelSlots(GPU{VRAM: vramGiB * gib})).To(Equal(want)) + }, + Entry("GB10 119 GiB", uint64(119), 8), + Entry("48 GiB", uint64(48), 8), + Entry("24 GiB", uint64(24), 4), + Entry("8 GiB", uint64(8), 4), + Entry("6 GiB", uint64(6), 2), + Entry("2 GiB", uint64(2), 1), + Entry("unknown 0", uint64(0), 1), + ) + + Describe("ParallelSlotsForContext (per-device VRAM headroom)", func() { + It("keeps the VRAM-scaled slot count when the context fits the device", func() { + // 16 GiB card, small context: plenty of room for concurrency. + Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 8192)).To(Equal(4)) + }) + It("drops to a single slot when a large context already fills the device", func() { + // Regression guard for issue #10485: 16 GiB consumer Blackwell, ~200k + // context. Even with unified KV, the per-slot compute/checkpoint + // scratch from 4 slots is the straw that overflows the tighter device. + Expect(ParallelSlotsForContext(GPU{VRAM: 16 * gib}, 204800)).To(Equal(1)) + }) + It("keeps concurrency on a large unified-memory device (GB10)", func() { + // GB10 reports system RAM (~119 GiB): a 200k context leaves headroom. + Expect(ParallelSlotsForContext(GPU{VRAM: 119 * gib}, 204800)).To(Equal(8)) + }) + It("keeps concurrency on a big datacenter card with a large context", func() { + // 80 GiB A100: 200k context is a small fraction, concurrency stays. + Expect(ParallelSlotsForContext(GPU{VRAM: 80 * gib}, 204800)).To(Equal(8)) + }) + It("stays a single slot on small/unknown VRAM regardless of context", func() { + Expect(ParallelSlotsForContext(GPU{VRAM: 2 * gib}, 8192)).To(Equal(1)) + Expect(ParallelSlotsForContext(GPU{}, 8192)).To(Equal(1)) + }) + }) + + Describe("ApplyHardwareDefaults parallel slots", func() { + It("adds a VRAM-scaled parallel option on a capable GPU", func() { + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.1", VRAM: 119 * gib}) + Expect(cfg.Options).To(ContainElement("parallel:8")) + }) + It("adds no parallel option when a large context already fills one device", func() { + // Regression guard for issue #10485: 16 GiB card + ~200k context. The + // model barely fits; defaulting concurrency tips the tighter GPU into + // CUDA OOM during the final (MTP draft) KV allocation. + ctx := 204800 + cfg := &ModelConfig{LLMConfig: LLMConfig{ContextSize: &ctx}} + ApplyHardwareDefaults(cfg, GPU{ComputeCapability: "12.0", VRAM: 16 * gib}) + Expect(cfg.Options).ToNot(ContainElement(ContainSubstring("parallel"))) + }) + It("scales the slot count down with VRAM", func() { + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{VRAM: 24 * gib}) + Expect(cfg.Options).To(ContainElement("parallel:4")) + }) + It("adds no parallel option on small/unknown VRAM", func() { + cfg := &ModelConfig{} + ApplyHardwareDefaults(cfg, GPU{VRAM: 2 * gib}) + Expect(cfg.Options).ToNot(ContainElement(ContainSubstring("parallel"))) + }) + It("never overrides an explicit parallel option", func() { + cfg := &ModelConfig{Options: []string{"parallel:2"}} + ApplyHardwareDefaults(cfg, GPU{VRAM: 119 * gib}) + Expect(cfg.Options).To(Equal([]string{"parallel:2"})) + }) + }) +}) diff --git a/core/config/hooks_llamacpp.go b/core/config/hooks_llamacpp.go index 4ced8a9b1..09bdbe868 100644 --- a/core/config/hooks_llamacpp.go +++ b/core/config/hooks_llamacpp.go @@ -34,7 +34,7 @@ func llamaCppDefaults(cfg *ModelConfig, modelPath string) { // Default context size if not set, regardless of whether GGUF parsing succeeds defer func() { if cfg.ContextSize == nil { - ctx := defaultContextSize + ctx := GGUFFallbackContextSize cfg.ContextSize = &ctx } }() diff --git a/core/config/meta/constants.go b/core/config/meta/constants.go index 72da2f99a..7fed6ba75 100644 --- a/core/config/meta/constants.go +++ b/core/config/meta/constants.go @@ -68,6 +68,7 @@ var UsecaseOptions = []FieldOption{ {Value: "face_recognition", Label: "Face Recognition"}, {Value: "transcript", Label: "Transcript"}, {Value: "diarization", Label: "Diarization"}, + {Value: "sound_classification", Label: "Sound Classification"}, {Value: "speaker_recognition", Label: "Speaker Recognition"}, {Value: "tts", Label: "TTS"}, {Value: "sound_generation", Label: "Sound Generation"}, diff --git a/core/config/meta/registry.go b/core/config/meta/registry.go index ca10f604c..3476076e1 100644 --- a/core/config/meta/registry.go +++ b/core/config/meta/registry.go @@ -286,6 +286,15 @@ func DefaultRegistry() map[string]FieldMetaOverride { Order: 45, }, + // --- Alias --- + "alias": { + Section: "alias", + Label: "Alias target", + Description: "Redirect all traffic for this model to another configured model. When set, every other field on this config is ignored and requests are served by the target model.", + Component: "model-select", + Order: 0, + }, + // --- Pipeline --- "pipeline.llm": { Section: "pipeline", @@ -319,6 +328,30 @@ func DefaultRegistry() map[string]FieldMetaOverride { AutocompleteProvider: ProviderModelsVAD, Order: 63, }, + "pipeline.sound_detection": { + Section: "pipeline", + Label: "Sound Detection Model", + Description: "Model to use for sound-event classification (audio tagging, e.g. ced) in the pipeline. When set, committed realtime audio is also classified and the scored AudioSet tags are emitted as a conversation.item.sound_detection event.", + Component: "model-select", + AutocompleteProvider: ProviderModels, + Order: 64, + }, + "pipeline.sound_detection_window_ms": { + Section: "pipeline", + Label: "Sound Detection Window (ms)", + Description: "Server-side windowing for a sound-only realtime session: length in ms of the audio window classified each hop. 0 = client-driven (the client commits windows).", + Component: "number", + Min: f64(0), + Order: 65, + }, + "pipeline.sound_detection_hop_ms": { + Section: "pipeline", + Label: "Sound Detection Hop (ms)", + Description: "Server-side windowing hop in ms: how often the server classifies the last window. 0 = client-driven.", + Component: "number", + Min: f64(0), + Order: 66, + }, "pipeline.reasoning_effort": { Section: "pipeline", Label: "Reasoning Effort", @@ -448,6 +481,55 @@ func DefaultRegistry() map[string]FieldMetaOverride { Component: "json-editor", Order: 78, }, + "pipeline.voice_recognition.enforce": { + Section: "pipeline", + Label: "Voice Gate Enforce", + Description: "Whether the gate rejects unauthorized speakers. Enabled (default) drops unauthorized utterances before the LLM. Disabled still resolves and surfaces the speaker (for the conversation.item.speaker event and personalization) but never drops a turn.", + Component: "toggle", + Order: 80, + }, + "pipeline.voice_recognition.identity.announce": { + Section: "pipeline", + Label: "Speaker Identity Announce", + Description: "Emit a conversation.item.speaker event to the client naming the recognized speaker. When set, identity is resolved on every turn even if 'when' is 'first'.", + Component: "toggle", + Order: 81, + }, + "pipeline.voice_recognition.identity.announce_unknown": { + Section: "pipeline", + Label: "Speaker Identity Announce Unknown", + Description: "Also emit the conversation.item.speaker event (with matched=false) when no confident match is found. Default only announces on a match.", + Component: "toggle", + Order: 82, + }, + "pipeline.voice_recognition.identity.personalize": { + Section: "pipeline", + Label: "Speaker Identity Personalize", + Description: "Inform the LLM who is speaking so it can tailor replies. Enables the name and system-note injection below.", + Component: "toggle", + Order: 83, + }, + "pipeline.voice_recognition.identity.inject_name": { + Section: "pipeline", + Label: "Speaker Identity Inject Name", + Description: "Personalization: set the per-message OpenAI 'name' field on each user turn to the recognized speaker.", + Component: "toggle", + Order: 84, + }, + "pipeline.voice_recognition.identity.inject_system_note": { + Section: "pipeline", + Label: "Speaker Identity Inject System Note", + Description: "Personalization: append a 'The current speaker is .' note to the system message reflecting the latest speaker.", + Component: "toggle", + Order: 85, + }, + "pipeline.voice_recognition.identity.note_unknown": { + Section: "pipeline", + Label: "Speaker Identity Note Unknown", + Description: "Personalization: when the speaker is unidentified, append 'The current speaker is unknown.' to the system message so the model can ask who it is talking to.", + Component: "toggle", + Order: 86, + }, "pipeline.max_history_items": { Section: "pipeline", Label: "Max History Items", @@ -455,6 +537,36 @@ func DefaultRegistry() map[string]FieldMetaOverride { Component: "number", Order: 79, }, + "pipeline.compaction.enabled": { + Section: "pipeline", + Label: "Compaction Enabled", + Description: "Fold conversation items that age out of the live window (Max History Items) into a rolling summary instead of dropping them, so long realtime sessions stay cheap without losing earlier context. Off by default.", + Component: "toggle", + Order: 80, + }, + "pipeline.compaction.trigger_items": { + Section: "pipeline", + Label: "Compaction Trigger Items", + Description: "High-water mark: once the live conversation exceeds this many items, the overflow above Max History Items is summarized and evicted. Must be greater than Max History Items; defaults to twice it. The gap controls how often summarization runs.", + Component: "number", + Order: 81, + }, + "pipeline.compaction.summary_model": { + Section: "pipeline", + Label: "Compaction Summary Model", + Description: "Optional smaller/cheaper model used to produce the rolling summary. Empty reuses the pipeline's own LLM. On CPU, a tiny model here keeps compaction from competing with the conversation LLM.", + Component: "input", + Advanced: true, + Order: 82, + }, + "pipeline.compaction.max_summary_tokens": { + Section: "pipeline", + Label: "Compaction Max Summary Tokens", + Description: "Advisory cap on the rolling summary length (fed to the summarizer prompt). Defaults to 512.", + Component: "number", + Advanced: true, + Order: 83, + }, // --- Functions --- "function.grammar.parallel_calls": { diff --git a/core/config/meta/registry_test.go b/core/config/meta/registry_test.go new file mode 100644 index 000000000..e9d998609 --- /dev/null +++ b/core/config/meta/registry_test.go @@ -0,0 +1,28 @@ +package meta_test + +import ( + "github.com/mudler/LocalAI/core/config/meta" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("alias field metadata", func() { + It("registers the alias field as a model-select in the alias section", func() { + reg := meta.DefaultRegistry() + f, ok := reg["alias"] + Expect(ok).To(BeTrue(), "alias field should have a registry override") + Expect(f.Section).To(Equal("alias")) + Expect(f.Component).To(Equal("model-select")) + }) + + It("defines an alias section", func() { + var found bool + for _, s := range meta.DefaultSections() { + if s.ID == "alias" { + found = true + } + } + Expect(found).To(BeTrue(), "DefaultSections should include an alias section") + }) +}) diff --git a/core/config/meta/types.go b/core/config/meta/types.go index a86b8bb69..a29e66967 100644 --- a/core/config/meta/types.go +++ b/core/config/meta/types.go @@ -69,6 +69,7 @@ type FieldMetaOverride struct { func DefaultSections() []Section { return []Section{ {ID: "general", Label: "General", Icon: "settings", Order: 0}, + {ID: "alias", Label: "Alias", Icon: "git-merge", Order: 5}, {ID: "llm", Label: "LLM", Icon: "cpu", Order: 10}, {ID: "parameters", Label: "Parameters", Icon: "sliders", Order: 20}, {ID: "templates", Label: "Templates", Icon: "file-text", Order: 30}, diff --git a/core/config/model_config.go b/core/config/model_config.go index dfe151a64..2d1e18cc7 100644 --- a/core/config/model_config.go +++ b/core/config/model_config.go @@ -37,6 +37,12 @@ type ModelConfig struct { schema.PredictionOptions `yaml:"parameters,omitempty" json:"parameters,omitempty"` Name string `yaml:"name,omitempty" json:"name,omitempty"` + // Alias, when set, makes this config a pure redirect: every request for + // Name is served by the model named here. All other fields are ignored. + // The target must be an existing, non-alias model (enforced at load and + // at create/swap time). See docs/content for Model Aliases. + Alias string `yaml:"alias,omitempty" json:"alias,omitempty"` + F16 *bool `yaml:"f16,omitempty" json:"f16,omitempty"` Threads *int `yaml:"threads,omitempty" json:"threads,omitempty"` Debug *bool `yaml:"debug,omitempty" json:"debug,omitempty"` @@ -391,6 +397,10 @@ func (c *ModelConfig) HasRouter() bool { return len(c.Router.Candidates) > 0 } +// IsAlias reports whether this config is a pure redirect to another model. +// Value receiver so it is callable on non-addressable config values too. +func (c ModelConfig) IsAlias() bool { return c.Alias != "" } + // @Description PII filtering configuration. PII redaction is per-model so // that local models don't pay the latency or behaviour change of regex // scanning, while cloud-bound traffic (cloud-proxy backend) can default to @@ -594,6 +604,20 @@ type Pipeline struct { LLM string `yaml:"llm,omitempty" json:"llm,omitempty"` Transcription string `yaml:"transcription,omitempty" json:"transcription,omitempty"` VAD string `yaml:"vad,omitempty" json:"vad,omitempty"` + // SoundDetection names a sound-event-classification model (e.g. ced). When + // set, each VAD-committed realtime utterance is also run through it and the + // scored AudioSet tags are emitted as a conversation.item.sound_detection + // server event, alongside (and independent of) transcription. + SoundDetection string `yaml:"sound_detection,omitempty" json:"sound_detection,omitempty"` + + // SoundDetectionWindowMs / SoundDetectionHopMs enable server-side windowing + // for a sound-detection-only realtime session: instead of the client + // committing audio buffers, the server classifies the last WindowMs of + // streamed audio every HopMs and emits a sound_detection event per hop. Both + // must be > 0 to activate; otherwise the session stays client-driven (the + // client commits windows via input_audio_buffer.commit). + SoundDetectionWindowMs int `yaml:"sound_detection_window_ms,omitempty" json:"sound_detection_window_ms,omitempty"` + SoundDetectionHopMs int `yaml:"sound_detection_hop_ms,omitempty" json:"sound_detection_hop_ms,omitempty"` // ReasoningEffort sets the reasoning effort (none|minimal|low|medium|high) for // the pipeline's LLM without editing the LLM model config. Overrides the LLM's @@ -617,11 +641,32 @@ type Pipeline struct { // context fills. MaxHistoryItems *int `yaml:"max_history_items,omitempty" json:"max_history_items,omitempty"` + // Compaction folds conversation items that age out of the live window + // (max_history_items) into a rolling summary instead of dropping them, so + // long realtime sessions stay cheap without losing earlier context. Nil + // (block absent) means disabled, preserving existing behavior. + Compaction *PipelineCompaction `yaml:"compaction,omitempty" json:"compaction,omitempty"` + // VoiceRecognition gates the pipeline behind speaker verification. Nil // (block absent) means no gate, preserving existing behavior. VoiceRecognition *PipelineVoiceRecognition `yaml:"voice_recognition,omitempty" json:"voice_recognition,omitempty"` } +// PipelineCompaction configures summarize-then-drop for a realtime pipeline. +type PipelineCompaction struct { + // Enabled turns summarize-then-drop on. Default false. + Enabled bool `yaml:"enabled,omitempty" json:"enabled,omitempty"` + // TriggerItems is the high-water mark: once live items exceed it, overflow + // above max_history_items is summarized and evicted. Must exceed + // max_history_items; clamped up if not. Default: 2x max_history_items. + TriggerItems int `yaml:"trigger_items,omitempty" json:"trigger_items,omitempty"` + // SummaryModel optionally names a smaller/cheaper model for the summary + // call. Empty uses the pipeline's own LLM. + SummaryModel string `yaml:"summary_model,omitempty" json:"summary_model,omitempty"` + // MaxSummaryTokens advises the summary length (fed to the prompt). Default 512. + MaxSummaryTokens int `yaml:"max_summary_tokens,omitempty" json:"max_summary_tokens,omitempty"` +} + // ApplyReasoningEffort resolves the effective reasoning effort — a per-request // value (requestEffort) overrides the config's own ReasoningEffort default — // stores it on the config so gRPCPredictOpts forwards it to the backend as the @@ -759,6 +804,13 @@ type PipelineVoiceRecognition struct { Allow VoiceRecognitionAllow `yaml:"allow,omitempty" json:"allow,omitempty"` // References are the authorized reference speakers (verify mode). References []VoiceReference `yaml:"references,omitempty" json:"references,omitempty"` + // Enforce controls the authorization gate. A nil value or true rejects + // unauthorized speakers (the historical behavior). false resolves the + // speaker's identity for surfacing/personalization but never drops a turn. + Enforce *bool `yaml:"enforce,omitempty" json:"enforce,omitempty"` + // Identity surfaces the recognized speaker to the client and the LLM. It is + // independent of Enforce: identity can be surfaced without gating. + Identity *VoiceIdentityConfig `yaml:"identity,omitempty" json:"identity,omitempty"` } // @Description VoiceRecognitionAllow filters authorized registry identities. @@ -775,6 +827,25 @@ type VoiceReference struct { Audio string `yaml:"audio,omitempty" json:"audio,omitempty"` } +// @Description VoiceIdentityConfig surfaces the recognized speaker to the realtime +// client and the LLM. When set, identity is resolved on every turn even if the +// gate's When is "first" (the gate still authorizes only once). +type VoiceIdentityConfig struct { + // Announce emits a conversation.item.speaker event to the client. + Announce bool `yaml:"announce,omitempty" json:"announce,omitempty"` + // AnnounceUnknown also emits the event when there is no confident match. + AnnounceUnknown bool `yaml:"announce_unknown,omitempty" json:"announce_unknown,omitempty"` + // Personalize informs the LLM who is speaking. + Personalize bool `yaml:"personalize,omitempty" json:"personalize,omitempty"` + // InjectName sets the per-message name field on each user turn. + InjectName bool `yaml:"inject_name,omitempty" json:"inject_name,omitempty"` + // InjectSystemNote maintains a "current speaker" note in the system message. + InjectSystemNote bool `yaml:"inject_system_note,omitempty" json:"inject_system_note,omitempty"` + // NoteUnknown adds a "the current speaker is unknown" note (enables the model + // to ask who it is talking to). + NoteUnknown bool `yaml:"note_unknown,omitempty" json:"note_unknown,omitempty"` +} + // VoiceGateEnabled reports whether a voice-recognition gate is configured. The // mere presence of the block is the intent signal: a present-but-incomplete // block (e.g. missing model) must fail closed at construction, not be silently @@ -783,6 +854,28 @@ func (p Pipeline) VoiceGateEnabled() bool { return p.VoiceRecognition != nil } +// EnforceGate reports whether the gate rejects unauthorized speakers. A nil +// Enforce means "enforce" so existing configs keep gating. +func (p PipelineVoiceRecognition) EnforceGate() bool { + return p.Enforce == nil || *p.Enforce +} + +// IdentityEnabled reports whether the speaker's identity must be resolved for +// surfacing or personalization. +func (p PipelineVoiceRecognition) IdentityEnabled() bool { + return p.Identity != nil && (p.Identity.Announce || p.Identity.Personalize) +} + +// AnnounceEnabled reports whether to emit the conversation.item.speaker event. +func (p PipelineVoiceRecognition) AnnounceEnabled() bool { + return p.Identity != nil && p.Identity.Announce +} + +// PersonalizeEnabled reports whether to inform the LLM of the speaker. +func (p PipelineVoiceRecognition) PersonalizeEnabled() bool { + return p.Identity != nil && p.Identity.Personalize +} + // Normalize fills in defaults in place for omitted fields. func (v *PipelineVoiceRecognition) Normalize() { if v.Mode == "" { @@ -1111,107 +1204,17 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) { // This ensures gallery-installed and runtime-loaded models get optimal parameters. ApplyInferenceDefaults(cfg, cfg.Name, cfg.Model) - // https://github.com/ggerganov/llama.cpp/blob/75cd4c77292034ecec587ecb401366f57338f7c0/common/sampling.h#L22 - defaultTopP := 0.95 - defaultTopK := 40 - defaultMinP := 0.0 - defaultTemp := 0.9 - // https://github.com/mudler/LocalAI/issues/2780 - defaultMirostat := 0 - defaultMirostatTAU := 5.0 - defaultMirostatETA := 0.1 - defaultTypicalP := 1.0 - defaultTFZ := 1.0 - defaultZero := 0 + // Apply serving-policy defaults (device-independent): cross-request prefix + // caching. Propagates to distributed nodes via the model options. + ApplyServingDefaults(cfg) + + // Generic fallback defaults (sampling params + runtime flags), applied after + // the model-family / hardware / serving tiers above. Only fills unset values. + ApplyGenericDefaults(cfg) trueV := true falseV := false - if cfg.Seed == nil { - // random number generator seed - defaultSeed := RAND_SEED - cfg.Seed = &defaultSeed - } - - // top_k=40 is llama.cpp's sampling default and is wrong for backends whose - // native default differs (issue #6632). Only inject it for the llama.cpp - // family and the empty/auto backend; leave TopK nil for known non-llama - // backends (e.g. mlx, whose intended default is top_k=0) so the wire value - // is 0 rather than a silently-changed 40. - if cfg.TopK == nil && UsesLlamaSamplerDefaults(cfg.Backend) { - cfg.TopK = &defaultTopK - } - - if cfg.MinP == nil { - cfg.MinP = &defaultMinP - } - - if cfg.TypicalP == nil { - cfg.TypicalP = &defaultTypicalP - } - - if cfg.TFZ == nil { - cfg.TFZ = &defaultTFZ - } - - if cfg.MMap == nil { - // MMap is enabled by default - - // Only exception is for Intel GPUs - if os.Getenv("XPU") != "" { - cfg.MMap = &falseV - } else { - cfg.MMap = &trueV - } - } - - if cfg.MMlock == nil { - // MMlock is disabled by default - cfg.MMlock = &falseV - } - - if cfg.TopP == nil { - cfg.TopP = &defaultTopP - } - if cfg.Temperature == nil { - cfg.Temperature = &defaultTemp - } - - if cfg.Maxtokens == nil { - cfg.Maxtokens = &defaultZero - } - - if cfg.Mirostat == nil { - cfg.Mirostat = &defaultMirostat - } - - if cfg.MirostatETA == nil { - cfg.MirostatETA = &defaultMirostatETA - } - - if cfg.MirostatTAU == nil { - cfg.MirostatTAU = &defaultMirostatTAU - } - - if cfg.LowVRAM == nil { - cfg.LowVRAM = &falseV - } - - if cfg.Embeddings == nil { - cfg.Embeddings = &falseV - } - - if cfg.Reranking == nil { - cfg.Reranking = &falseV - } - - if cfg.PromptCacheAll == nil { - // Match upstream llama.cpp's default (common/common.h: cache_prompt = true) - // and let cache_idle_slots / kv_unified actually do useful work; users can - // opt out with an explicit `prompt_cache_all: false` in the model YAML. - cfg.PromptCacheAll = &trueV - } - if threads == 0 { // Threads can't be 0 threads = 4 @@ -1239,10 +1242,36 @@ func (cfg *ModelConfig) SetDefaults(opts ...ConfigLoaderOption) { cfg.ContextSize = &ctx } runBackendHooks(cfg, lo.modelPath) + + // Apply hardware-driven defaults (e.g. a larger physical batch on Blackwell) + // LAST, after the context size is fully resolved (explicit config, LoadOptions, + // then the GGUF guess inside runBackendHooks): the Blackwell batch guard sizes + // the per-device compute buffer against this model's context, so it must see + // the final value, not a pre-guess nil. Uses the local GPU here; in distributed + // mode the router re-applies the same heuristics for the selected node's GPU + // before loading. Explicit config always wins. + ApplyHardwareDefaults(cfg, localGPU()) + cfg.syncKnownUsecasesFromString() } func (c *ModelConfig) Validate() (bool, error) { + // An alias is a pure redirect: validate only its own shape here. Target + // existence and the no-chain rule need the full config set, so the loader + // (load-time) and the create/swap endpoints enforce those. + if c.IsAlias() { + if c.Name == "" { + return false, fmt.Errorf("alias config requires a name") + } + if c.Alias == c.Name { + return false, fmt.Errorf("alias %q cannot point to itself", c.Name) + } + if c.Backend != "" || c.Model != "" { + return false, fmt.Errorf("alias config %q must not set backend or parameters.model: an alias is a pure redirect", c.Name) + } + return true, nil + } + downloadedFileNames := []string{} for _, f := range c.DownloadFiles { downloadedFileNames = append(downloadedFileNames, f.Filename) @@ -1463,6 +1492,11 @@ const ( // so it may combine freely with other usecases. FLAG_TOKEN_CLASSIFY ModelConfigUsecase = 0b1000000000000000000000 + // Marks a model as wired for the SoundDetection gRPC primitive + // (audio tagging / sound-event classification — scored AudioSet + // labels via the SoundDetection RPC, e.g. ced). + FLAG_SOUND_CLASSIFICATION ModelConfigUsecase = 0b10000000000000000000000 + // Common Subsets FLAG_LLM ModelConfigUsecase = FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT ) @@ -1471,12 +1505,12 @@ const ( // Flags within the same group are NOT orthogonal (e.g., chat and completion are // both text/language). A model is multimodal when its usecases span 2+ groups. var ModalityGroups = []ModelConfigUsecase{ - FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT, // text/language - FLAG_VISION | FLAG_DETECTION, // visual understanding - FLAG_TRANSCRIPT | FLAG_REALTIME_AUDIO, // speech input — realtime_audio is any-to-any, so it counts here too - FLAG_TTS | FLAG_SOUND_GENERATION | FLAG_REALTIME_AUDIO, // audio output — and here, so a lone realtime_audio flag still reads as multimodal - FLAG_AUDIO_TRANSFORM, // audio in/out transforms - FLAG_IMAGE | FLAG_VIDEO, // visual generation + FLAG_CHAT | FLAG_COMPLETION | FLAG_EDIT, // text/language + FLAG_VISION | FLAG_DETECTION, // visual understanding + FLAG_TRANSCRIPT | FLAG_REALTIME_AUDIO | FLAG_SOUND_CLASSIFICATION, // audio input — realtime_audio is any-to-any, so it counts here too + FLAG_TTS | FLAG_SOUND_GENERATION | FLAG_REALTIME_AUDIO, // audio output — and here, so a lone realtime_audio flag still reads as multimodal + FLAG_AUDIO_TRANSFORM, // audio in/out transforms + FLAG_IMAGE | FLAG_VIDEO, // visual generation } // IsMultimodal returns true if the given usecases span two or more orthogonal @@ -1499,29 +1533,30 @@ func GetAllModelConfigUsecases() map[string]ModelConfigUsecase { return map[string]ModelConfigUsecase{ // Note: FLAG_ANY is intentionally excluded from this map // because it's 0 and would always match in HasUsecases checks - "FLAG_CHAT": FLAG_CHAT, - "FLAG_COMPLETION": FLAG_COMPLETION, - "FLAG_EDIT": FLAG_EDIT, - "FLAG_EMBEDDINGS": FLAG_EMBEDDINGS, - "FLAG_RERANK": FLAG_RERANK, - "FLAG_IMAGE": FLAG_IMAGE, - "FLAG_TRANSCRIPT": FLAG_TRANSCRIPT, - "FLAG_TTS": FLAG_TTS, - "FLAG_SOUND_GENERATION": FLAG_SOUND_GENERATION, - "FLAG_TOKENIZE": FLAG_TOKENIZE, - "FLAG_VAD": FLAG_VAD, - "FLAG_LLM": FLAG_LLM, - "FLAG_VIDEO": FLAG_VIDEO, - "FLAG_DETECTION": FLAG_DETECTION, - "FLAG_VISION": FLAG_VISION, - "FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION, - "FLAG_SPEAKER_RECOGNITION": FLAG_SPEAKER_RECOGNITION, - "FLAG_AUDIO_TRANSFORM": FLAG_AUDIO_TRANSFORM, - "FLAG_DIARIZATION": FLAG_DIARIZATION, - "FLAG_REALTIME_AUDIO": FLAG_REALTIME_AUDIO, - "FLAG_SCORE": FLAG_SCORE, - "FLAG_DEPTH": FLAG_DEPTH, - "FLAG_TOKEN_CLASSIFY": FLAG_TOKEN_CLASSIFY, + "FLAG_CHAT": FLAG_CHAT, + "FLAG_COMPLETION": FLAG_COMPLETION, + "FLAG_EDIT": FLAG_EDIT, + "FLAG_EMBEDDINGS": FLAG_EMBEDDINGS, + "FLAG_RERANK": FLAG_RERANK, + "FLAG_IMAGE": FLAG_IMAGE, + "FLAG_TRANSCRIPT": FLAG_TRANSCRIPT, + "FLAG_TTS": FLAG_TTS, + "FLAG_SOUND_GENERATION": FLAG_SOUND_GENERATION, + "FLAG_TOKENIZE": FLAG_TOKENIZE, + "FLAG_VAD": FLAG_VAD, + "FLAG_LLM": FLAG_LLM, + "FLAG_VIDEO": FLAG_VIDEO, + "FLAG_DETECTION": FLAG_DETECTION, + "FLAG_VISION": FLAG_VISION, + "FLAG_FACE_RECOGNITION": FLAG_FACE_RECOGNITION, + "FLAG_SPEAKER_RECOGNITION": FLAG_SPEAKER_RECOGNITION, + "FLAG_AUDIO_TRANSFORM": FLAG_AUDIO_TRANSFORM, + "FLAG_DIARIZATION": FLAG_DIARIZATION, + "FLAG_SOUND_CLASSIFICATION": FLAG_SOUND_CLASSIFICATION, + "FLAG_REALTIME_AUDIO": FLAG_REALTIME_AUDIO, + "FLAG_SCORE": FLAG_SCORE, + "FLAG_DEPTH": FLAG_DEPTH, + "FLAG_TOKEN_CLASSIFY": FLAG_TOKEN_CLASSIFY, } } @@ -1724,6 +1759,16 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool { } } + if (u & FLAG_SOUND_CLASSIFICATION) == FLAG_SOUND_CLASSIFICATION { + // ced is a sound-event tagger (AudioSet labels) surfaced via the + // SoundDetection gRPC. Models without an explicit known_usecases + // still surface when they run on one of these backends. + soundClassificationBackends := []string{"ced"} + if !slices.Contains(soundClassificationBackends, c.Backend) { + return false + } + } + if (u & FLAG_REALTIME_AUDIO) == FLAG_REALTIME_AUDIO { // Backends that own a single any-to-any loop and implement // AudioToAudioStream — listed here so models without an explicit diff --git a/core/config/model_config_loader.go b/core/config/model_config_loader.go index 89f4bc5cb..e2f43e83f 100644 --- a/core/config/model_config_loader.go +++ b/core/config/model_config_loader.go @@ -294,6 +294,44 @@ func (bcl *ModelConfigLoader) UpdateModelConfig(m string, updater func(*ModelCon } } +// ResolveAlias follows a one-hop alias to its target config. Returns +// (resolved, wasAlias, err). Non-alias configs return (cfg, false, nil) +// unchanged. Strict: the target must exist and must not itself be an alias +// (chains are rejected). The returned config is a copy of the target. +func (bcl *ModelConfigLoader) ResolveAlias(cfg *ModelConfig) (*ModelConfig, bool, error) { + if cfg == nil || !cfg.IsAlias() { + return cfg, false, nil + } + target, exists := bcl.GetModelConfig(cfg.Alias) + if !exists { + return nil, true, fmt.Errorf("alias %q points to unknown model %q", cfg.Name, cfg.Alias) + } + if target.IsAlias() { + return nil, true, fmt.Errorf("alias %q points to another alias %q (chains are not allowed)", cfg.Name, cfg.Alias) + } + return &target, true, nil +} + +// ValidateAliasTarget checks an alias config's target at create/swap time: +// the target must exist, must not be an alias, and must not be disabled. +// Returns nil for non-alias configs. +func (bcl *ModelConfigLoader) ValidateAliasTarget(cfg *ModelConfig) error { + if cfg == nil || !cfg.IsAlias() { + return nil + } + target, exists := bcl.GetModelConfig(cfg.Alias) + if !exists { + return fmt.Errorf("alias target %q does not exist", cfg.Alias) + } + if target.IsAlias() { + return fmt.Errorf("alias target %q is itself an alias (chains are not allowed)", cfg.Alias) + } + if target.IsDisabled() { + return fmt.Errorf("alias target %q is disabled", cfg.Alias) + } + return nil +} + // Preload prepare models if they are not local but url or huggingface repositories func (bcl *ModelConfigLoader) Preload(modelPath string) error { bcl.Lock() @@ -475,5 +513,21 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf } } + // Surface aliases whose targets are missing or themselves aliases. These + // resolve to a clear request-time error; warning here gives operators + // visibility without failing startup. + for name, c := range bcl.configs { + if !c.IsAlias() { + continue + } + target, ok := bcl.configs[c.Alias] + switch { + case !ok: + xlog.Warn("alias points to unknown model", "alias", name, "target", c.Alias) + case target.IsAlias(): + xlog.Warn("alias points to another alias (chains are not allowed)", "alias", name, "target", c.Alias) + } + } + return nil } diff --git a/core/config/model_config_loader_test.go b/core/config/model_config_loader_test.go index 924a4d1e4..06ab65a20 100644 --- a/core/config/model_config_loader_test.go +++ b/core/config/model_config_loader_test.go @@ -61,3 +61,51 @@ var _ = Describe("ModelConfigLoader.GetModelsConflictingWith", func() { Expect(bcl.GetModelsConflictingWith("a")).To(ConsistOf("b")) }) }) + +var _ = Describe("ModelConfigLoader alias resolution", func() { + var loader *ModelConfigLoader + + BeforeEach(func() { + loader = NewModelConfigLoader("") + loader.configs["real"] = ModelConfig{Name: "real", Backend: "llama-cpp"} + loader.configs["gpt-4"] = ModelConfig{Name: "gpt-4", Alias: "real"} + loader.configs["chain"] = ModelConfig{Name: "chain", Alias: "gpt-4"} + loader.configs["dangling"] = ModelConfig{Name: "dangling", Alias: "nope"} + }) + + It("returns non-alias configs unchanged", func() { + cfg := loader.configs["real"] + got, was, err := loader.ResolveAlias(&cfg) + Expect(err).ToNot(HaveOccurred()) + Expect(was).To(BeFalse()) + Expect(got.Name).To(Equal("real")) + }) + + It("resolves an alias to its target", func() { + cfg := loader.configs["gpt-4"] + got, was, err := loader.ResolveAlias(&cfg) + Expect(err).ToNot(HaveOccurred()) + Expect(was).To(BeTrue()) + Expect(got.Name).To(Equal("real")) + }) + + It("rejects an alias chain", func() { + cfg := loader.configs["chain"] + _, was, err := loader.ResolveAlias(&cfg) + Expect(was).To(BeTrue()) + Expect(err).To(MatchError(ContainSubstring("chains are not allowed"))) + }) + + It("rejects a dangling alias", func() { + cfg := loader.configs["dangling"] + _, _, err := loader.ResolveAlias(&cfg) + Expect(err).To(MatchError(ContainSubstring("unknown model"))) + }) + + It("ValidateAliasTarget passes for a real target and fails for a chain", func() { + good := loader.configs["gpt-4"] + Expect(loader.ValidateAliasTarget(&good)).ToNot(HaveOccurred()) + bad := loader.configs["chain"] + Expect(loader.ValidateAliasTarget(&bad)).To(MatchError(ContainSubstring("itself an alias"))) + }) +}) diff --git a/core/config/model_config_test.go b/core/config/model_config_test.go index 7f256354d..2f2f3fd82 100644 --- a/core/config/model_config_test.go +++ b/core/config/model_config_test.go @@ -787,3 +787,32 @@ var _ = Describe("pattern detector config", func() { Expect(err).To(MatchError(ContainSubstring("pattern \"EMAILish\""))) }) }) + +var _ = Describe("ModelConfig alias", func() { + It("reports IsAlias when alias is set", func() { + c := ModelConfig{Name: "gpt-4", Alias: "my-llama-3"} + Expect(c.IsAlias()).To(BeTrue()) + Expect(ModelConfig{Name: "real"}.IsAlias()).To(BeFalse()) + }) + + It("validates a minimal alias config", func() { + c := ModelConfig{Name: "gpt-4", Alias: "my-llama-3"} + ok, err := c.Validate() + Expect(err).ToNot(HaveOccurred()) + Expect(ok).To(BeTrue()) + }) + + It("rejects an alias pointing to itself", func() { + c := ModelConfig{Name: "loop", Alias: "loop"} + ok, err := c.Validate() + Expect(ok).To(BeFalse()) + Expect(err).To(MatchError(ContainSubstring("itself"))) + }) + + It("rejects an alias that also sets a backend", func() { + c := ModelConfig{Name: "gpt-4", Alias: "my-llama-3", Backend: "llama-cpp"} + ok, err := c.Validate() + Expect(ok).To(BeFalse()) + Expect(err).To(MatchError(ContainSubstring("pure redirect"))) + }) +}) diff --git a/core/config/runtime_settings.go b/core/config/runtime_settings.go index 5c5f2986f..a7f15e658 100644 --- a/core/config/runtime_settings.go +++ b/core/config/runtime_settings.go @@ -28,6 +28,7 @@ type RuntimeSettings struct { // Eviction settings ForceEvictionWhenBusy *bool `json:"force_eviction_when_busy,omitempty"` // Force eviction even when models have active API calls (default: false for safety) + SizeAwareEviction *bool `json:"size_aware_eviction,omitempty"` // Evict largest models first rather than least-recently-used (default: false) LRUEvictionMaxRetries *int `json:"lru_eviction_max_retries,omitempty"` // Maximum number of retries when waiting for busy models to become idle (default: 30) LRUEvictionRetryInterval *string `json:"lru_eviction_retry_interval,omitempty"` // Interval between retries when waiting for busy models (e.g., 1s, 2s) (default: 1s) diff --git a/core/config/runtime_settings_persist.go b/core/config/runtime_settings_persist.go index bb5a4110b..98a923b40 100644 --- a/core/config/runtime_settings_persist.go +++ b/core/config/runtime_settings_persist.go @@ -5,6 +5,7 @@ import ( "errors" "os" "path/filepath" + "reflect" ) // runtimeSettingsFile is the on-disk filename inside DynamicConfigsDir. @@ -33,6 +34,35 @@ func (o *ApplicationConfig) ReadPersistedSettings() (RuntimeSettings, error) { return settings, nil } +// MergeNonNil overlays every set (non-nil) field of overlay onto the +// receiver, leaving the receiver's value untouched wherever overlay left a +// field unset. Every RuntimeSettings field is a pointer precisely so "set" +// can be told apart from "absent" (see the type doc), which makes this a +// faithful partial update: a caller that submits only the field it owns +// changes exactly that field and never clobbers unrelated settings. +// +// This is the read-modify-write contract the persistence helpers exist for. +// UpdateSettingsEndpoint reads the on-disk settings, merges the request body +// on top, and writes the result — so a focused admin page that POSTs only its +// own field (the Middleware page sends only mitm_listen; the detector table +// only pii_default_detectors) no longer nulls every other setting. +// +// Reflection keeps the merge total over the struct: a field added to +// RuntimeSettings later is merged automatically, so the persistence path can +// never silently drop a new setting the way a hand-maintained field list +// would. Non-pointer fields (none today) are skipped — they cannot express +// "absent", so the receiver wins. +func (s *RuntimeSettings) MergeNonNil(overlay RuntimeSettings) { + dst := reflect.ValueOf(s).Elem() + src := reflect.ValueOf(overlay) + for i := 0; i < src.NumField(); i++ { + f := src.Field(i) + if f.Kind() == reflect.Pointer && !f.IsNil() { + dst.Field(i).Set(f) + } + } +} + // WritePersistedSettings serialises the given RuntimeSettings to // runtime_settings.json with restricted permissions (it may carry API // keys and P2P tokens). diff --git a/core/config/runtime_settings_persist_test.go b/core/config/runtime_settings_persist_test.go index a36acb0d2..7f5eb07a8 100644 --- a/core/config/runtime_settings_persist_test.go +++ b/core/config/runtime_settings_persist_test.go @@ -12,6 +12,7 @@ import ( ) func strPtr(s string) *string { return &s } +func boolPtr(b bool) *bool { return &b } var _ = Describe("RuntimeSettings persistence helpers", func() { var ( @@ -51,6 +52,47 @@ var _ = Describe("RuntimeSettings persistence helpers", func() { }) }) + // MergeNonNil is the partial-update primitive UpdateSettingsEndpoint + // relies on: a focused admin page POSTs only the field it owns, and the + // handler reads the on-disk settings and overlays the request on top. + // Without it, the body would be written verbatim and every field the + // caller omitted would be nulled (the reported regression: changing + // mitm_listen wiped the galleries, api keys, watchdog config, etc.). + Describe("MergeNonNil partial update", func() { + It("overlays set fields and preserves unset ones", func() { + base := config.RuntimeSettings{ + MITMListen: strPtr(":9000"), + Galleries: &[]config.Gallery{{Name: "g1", URL: "http://example/g1"}}, + WatchdogIdleEnabled: boolPtr(true), + ApiKeys: &[]string{"persisted-key"}, + PIIDefaultDetectors: &[]string{"det-a"}, + } + + // Simulate the Middleware proxy tab: only mitm_listen is sent. + overlay := config.RuntimeSettings{MITMListen: strPtr(":8443")} + base.MergeNonNil(overlay) + + Expect(base.MITMListen).ToNot(BeNil()) + Expect(*base.MITMListen).To(Equal(":8443"), "set field should be overlaid") + // Everything the overlay left unset must survive untouched. + Expect(base.Galleries).ToNot(BeNil(), "galleries were clobbered") + Expect(*base.Galleries).To(HaveLen(1)) + Expect(base.WatchdogIdleEnabled).ToNot(BeNil()) + Expect(*base.WatchdogIdleEnabled).To(BeTrue()) + Expect(base.ApiKeys).ToNot(BeNil(), "api_keys were clobbered") + Expect(*base.ApiKeys).To(Equal([]string{"persisted-key"})) + Expect(base.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were clobbered") + Expect(*base.PIIDefaultDetectors).To(Equal([]string{"det-a"})) + }) + + It("lets an explicit empty slice clear a field", func() { + base := config.RuntimeSettings{PIIDefaultDetectors: &[]string{"det-a"}} + base.MergeNonNil(config.RuntimeSettings{PIIDefaultDetectors: &[]string{}}) + Expect(base.PIIDefaultDetectors).ToNot(BeNil()) + Expect(*base.PIIDefaultDetectors).To(BeEmpty(), "an explicit empty slice should clear, not preserve") + }) + }) + // MITM round trip pins the contract that loadRuntimeSettingsFromFile // MITM listener address must survive a write/read round trip so the // next process restart can bring the listener back up. (Intercept diff --git a/core/config/serving_defaults.go b/core/config/serving_defaults.go new file mode 100644 index 000000000..3b10e7000 --- /dev/null +++ b/core/config/serving_defaults.go @@ -0,0 +1,56 @@ +package config + +import ( + "fmt" + "strings" + + "github.com/mudler/xlog" +) + +// Serving-policy model-config defaults. +// +// Sibling to hardware_defaults.go: those fill values driven by the target +// *device* (Blackwell batch, VRAM-scaled parallel slots); these fill values +// that improve multi-request / multi-user *serving* regardless of the GPU. They +// run together from SetDefaults and only ever fill values the user left unset. + +// DefaultCacheReuse is the minimum shared-prefix chunk (in tokens) the backend +// reuses across requests via KV-cache shifting. The llama.cpp backend ships this +// disabled (n_cache_reuse = 0); we enable it so repeated prefixes (system +// prompts, RAG context, agent scaffolds, multi-turn chat) are not recomputed. +// This is the universally-useful part of "paged attention" (cross-request prefix +// sharing) and needs none of the block-KV machinery. +const DefaultCacheReuse = 256 + +// ApplyServingDefaults fills serving-policy ModelConfig values the user left +// unset. Currently: enable cross-request prefix caching. Explicit +// cache_reuse/n_cache_reuse in the model options always wins. +func ApplyServingDefaults(cfg *ModelConfig) { + if cfg == nil { + return + } + if !backendOptionSet(cfg.Options, "cache_reuse", "n_cache_reuse") { + cfg.Options = append(cfg.Options, fmt.Sprintf("cache_reuse:%d", DefaultCacheReuse)) + xlog.Debug("[serving_defaults] enabling cross-request prefix cache", + "cache_reuse", DefaultCacheReuse) + } +} + +// backendOptionSet reports whether the backend options already set any of names. +// Options are "name:value" strings (or bare "name"); used so we never override +// an explicit value. Shared with hardware_defaults.go. +func backendOptionSet(opts []string, names ...string) bool { + for _, o := range opts { + name := o + if i := strings.IndexByte(o, ':'); i >= 0 { + name = o[:i] + } + name = strings.TrimSpace(strings.ToLower(name)) + for _, n := range names { + if name == n { + return true + } + } + } + return false +} diff --git a/core/config/serving_defaults_test.go b/core/config/serving_defaults_test.go new file mode 100644 index 000000000..2a5bba72a --- /dev/null +++ b/core/config/serving_defaults_test.go @@ -0,0 +1,30 @@ +package config_test + +import ( + . "github.com/mudler/LocalAI/core/config" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("Serving-policy config defaults", func() { + Describe("ApplyServingDefaults (cross-request prefix cache)", func() { + It("enables cache_reuse when unset", func() { + cfg := &ModelConfig{} + ApplyServingDefaults(cfg) + Expect(cfg.Options).To(ContainElement("cache_reuse:256")) + }) + It("never overrides an explicit cache_reuse", func() { + cfg := &ModelConfig{Options: []string{"cache_reuse:0"}} + ApplyServingDefaults(cfg) + Expect(cfg.Options).To(Equal([]string{"cache_reuse:0"})) + }) + It("recognizes the n_cache_reuse alias", func() { + cfg := &ModelConfig{Options: []string{"n_cache_reuse:512"}} + ApplyServingDefaults(cfg) + Expect(cfg.Options).To(Equal([]string{"n_cache_reuse:512"})) + }) + It("no-ops on nil", func() { + Expect(func() { ApplyServingDefaults(nil) }).ToNot(Panic()) + }) + }) +}) diff --git a/core/config/voice_gate_test.go b/core/config/voice_gate_test.go index 5c7782f1c..c0d25bf82 100644 --- a/core/config/voice_gate_test.go +++ b/core/config/voice_gate_test.go @@ -70,4 +70,32 @@ var _ = Describe("PipelineVoiceRecognition", func() { Expect((Pipeline{VoiceRecognition: &PipelineVoiceRecognition{}}).VoiceGateEnabled()).To(BeTrue()) }) }) + + Describe("Enforce / Identity helpers", func() { + It("treats a nil Enforce as enforcing (backward compatible)", func() { + v := PipelineVoiceRecognition{Model: "spk"} + Expect(v.EnforceGate()).To(BeTrue()) + }) + It("honors an explicit enforce:false", func() { + off := false + v := PipelineVoiceRecognition{Model: "spk", Enforce: &off} + Expect(v.EnforceGate()).To(BeFalse()) + }) + It("reports identity disabled when no identity block is set", func() { + v := PipelineVoiceRecognition{Model: "spk"} + Expect(v.IdentityEnabled()).To(BeFalse()) + Expect(v.AnnounceEnabled()).To(BeFalse()) + Expect(v.PersonalizeEnabled()).To(BeFalse()) + }) + It("reports identity enabled when announce or personalize is on", func() { + v := PipelineVoiceRecognition{Model: "spk", Identity: &VoiceIdentityConfig{Announce: true}} + Expect(v.IdentityEnabled()).To(BeTrue()) + Expect(v.AnnounceEnabled()).To(BeTrue()) + Expect(v.PersonalizeEnabled()).To(BeFalse()) + + v2 := PipelineVoiceRecognition{Model: "spk", Identity: &VoiceIdentityConfig{Personalize: true}} + Expect(v2.IdentityEnabled()).To(BeTrue()) + Expect(v2.PersonalizeEnabled()).To(BeTrue()) + }) + }) }) diff --git a/core/http/app.go b/core/http/app.go index 9ec0711fb..ee5cd99eb 100644 --- a/core/http/app.go +++ b/core/http/app.go @@ -149,6 +149,18 @@ func API(application *application.Application) (*echo.Echo, error) { // Middleware - StripPathPrefix must be registered early as it uses Rewrite which runs before routing e.Pre(httpMiddleware.StripPathPrefix()) + // Stamp the configured external base URL into each request context so + // middleware.BaseURL can treat it as authoritative for self-referential + // links. Registered as Pre so it runs before routing and handlers. + if extBaseURL := application.ApplicationConfig().ExternalBaseURL; extBaseURL != "" { + e.Pre(func(next echo.HandlerFunc) echo.HandlerFunc { + return func(c echo.Context) error { + c.Set("_external_base_url", extBaseURL) + return next(c) + } + }) + } + e.Pre(middleware.RemoveTrailingSlash()) if application.ApplicationConfig().MachineTag != "" { diff --git a/core/http/auth/db_sqlite.go b/core/http/auth/db_sqlite.go index 5c13ecf05..eecabe4a5 100644 --- a/core/http/auth/db_sqlite.go +++ b/core/http/auth/db_sqlite.go @@ -3,10 +3,51 @@ package auth import ( + "net/url" + "strings" + "gorm.io/driver/sqlite" "gorm.io/gorm" ) func openSQLiteDialector(path string) (gorm.Dialector, error) { - return sqlite.Open(path), nil + return sqlite.Open(buildSQLiteDSN(path)), nil +} + +// buildSQLiteDSN augments a SQLite file path with connection pragmas that make +// the auth DB resilient on slow or contended storage. +// +// - _busy_timeout=5000 makes SQLite retry for up to 5s on SQLITE_BUSY instead +// of failing immediately. Network-backed storage (SMB/CIFS/NFS, e.g. Azure +// Files) is prone to transient lock contention during migration (see #10506). +// - _txlock=immediate takes the write lock at BEGIN, avoiding deadlocks when a +// read transaction later upgrades to a write during AutoMigrate. +// +// We deliberately do NOT set WAL journal mode: WAL relies on a shared-memory +// mmap that does not work over SMB/NFS, which is exactly the failing case here. +// +// Caller-supplied values for either pragma are preserved. +func buildSQLiteDSN(path string) string { + base := path + rawQuery := "" + if i := strings.IndexByte(path, '?'); i >= 0 { + base = path[:i] + rawQuery = path[i+1:] + } + + values, err := url.ParseQuery(rawQuery) + if err != nil { + // An unparseable query string means a hand-crafted DSN we should not + // risk corrupting; leave it untouched. + return path + } + + if values.Get("_busy_timeout") == "" { + values.Set("_busy_timeout", "5000") + } + if values.Get("_txlock") == "" { + values.Set("_txlock", "immediate") + } + + return base + "?" + values.Encode() } diff --git a/core/http/auth/db_sqlite_test.go b/core/http/auth/db_sqlite_test.go new file mode 100644 index 000000000..f1dc4e404 --- /dev/null +++ b/core/http/auth/db_sqlite_test.go @@ -0,0 +1,57 @@ +//go:build auth + +package auth + +import ( + "net/url" + "strings" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +// parseDSN splits a "base?query" DSN into its base and decoded query values so +// assertions don't depend on url.Values.Encode()'s key ordering. +func parseDSN(dsn string) (string, url.Values) { + base := dsn + rawQuery := "" + if i := strings.IndexByte(dsn, '?'); i >= 0 { + base = dsn[:i] + rawQuery = dsn[i+1:] + } + values, err := url.ParseQuery(rawQuery) + Expect(err).ToNot(HaveOccurred()) + return base, values +} + +var _ = Describe("buildSQLiteDSN", func() { + It("adds busy_timeout and txlock to a plain file path", func() { + base, values := parseDSN(buildSQLiteDSN("/data/database.db")) + Expect(base).To(Equal("/data/database.db")) + Expect(values.Get("_busy_timeout")).To(Equal("5000")) + Expect(values.Get("_txlock")).To(Equal("immediate")) + }) + + It("adds pragmas to an in-memory database", func() { + base, values := parseDSN(buildSQLiteDSN(":memory:")) + Expect(base).To(Equal(":memory:")) + Expect(values.Get("_busy_timeout")).To(Equal("5000")) + Expect(values.Get("_txlock")).To(Equal("immediate")) + }) + + It("preserves an existing query string", func() { + base, values := parseDSN(buildSQLiteDSN("/data/database.db?cache=shared")) + Expect(base).To(Equal("/data/database.db")) + Expect(values.Get("cache")).To(Equal("shared")) + Expect(values.Get("_busy_timeout")).To(Equal("5000")) + Expect(values.Get("_txlock")).To(Equal("immediate")) + }) + + It("does not override a caller-supplied busy_timeout or txlock", func() { + _, values := parseDSN(buildSQLiteDSN("/data/database.db?_busy_timeout=1000&_txlock=deferred")) + Expect(values["_busy_timeout"]).To(HaveLen(1), "_busy_timeout should not be duplicated") + Expect(values.Get("_busy_timeout")).To(Equal("1000")) + Expect(values["_txlock"]).To(HaveLen(1), "_txlock should not be duplicated") + Expect(values.Get("_txlock")).To(Equal("deferred")) + }) +}) diff --git a/core/http/auth/features.go b/core/http/auth/features.go index 615e82a49..8dbb32a03 100644 --- a/core/http/auth/features.go +++ b/core/http/auth/features.go @@ -48,6 +48,10 @@ var RouteFeatureRegistry = []RouteFeature{ {"POST", "/v1/audio/diarization", FeatureAudioDiarization}, {"POST", "/audio/diarization", FeatureAudioDiarization}, + // Audio classification (sound-event tagging) + {"POST", "/v1/audio/classification", FeatureAudioClassification}, + {"POST", "/audio/classification", FeatureAudioClassification}, + // Audio speech / TTS {"POST", "/v1/audio/speech", FeatureAudioSpeech}, {"POST", "/audio/speech", FeatureAudioSpeech}, @@ -172,6 +176,7 @@ func APIFeatureMetas() []FeatureMeta { {FeatureAudioSpeech, "Audio Speech / TTS", true}, {FeatureAudioTranscription, "Audio Transcription", true}, {FeatureAudioDiarization, "Audio Diarization", true}, + {FeatureAudioClassification, "Audio Classification", true}, {FeatureVAD, "Voice Activity Detection", true}, {FeatureDetection, "Detection", true}, {FeatureVideo, "Video Generation", true}, diff --git a/core/http/auth/permissions.go b/core/http/auth/permissions.go index 47c4d64e1..1795792f9 100644 --- a/core/http/auth/permissions.go +++ b/core/http/auth/permissions.go @@ -38,24 +38,25 @@ const ( FeatureQuantization = "quantization" // API features (default ON for new users) - FeatureChat = "chat" - FeatureImages = "images" - FeatureAudioSpeech = "audio_speech" - FeatureAudioTranscription = "audio_transcription" - FeatureAudioDiarization = "audio_diarization" - FeatureVAD = "vad" - FeatureDetection = "detection" - FeatureVideo = "video" - FeatureEmbeddings = "embeddings" - FeatureSound = "sound" - FeatureRealtime = "realtime" - FeatureRerank = "rerank" - FeatureTokenize = "tokenize" - FeatureMCP = "mcp" - FeatureStores = "stores" - FeatureFaceRecognition = "face_recognition" - FeatureVoiceRecognition = "voice_recognition" - FeatureAudioTransform = "audio_transform" + FeatureChat = "chat" + FeatureImages = "images" + FeatureAudioSpeech = "audio_speech" + FeatureAudioTranscription = "audio_transcription" + FeatureAudioDiarization = "audio_diarization" + FeatureAudioClassification = "audio_classification" + FeatureVAD = "vad" + FeatureDetection = "detection" + FeatureVideo = "video" + FeatureEmbeddings = "embeddings" + FeatureSound = "sound" + FeatureRealtime = "realtime" + FeatureRerank = "rerank" + FeatureTokenize = "tokenize" + FeatureMCP = "mcp" + FeatureStores = "stores" + FeatureFaceRecognition = "face_recognition" + FeatureVoiceRecognition = "voice_recognition" + FeatureAudioTransform = "audio_transform" // FeaturePIIFilter gates the synchronous PII analyze/redact service // (POST /api/pii/{analyze,redact}). Default ON like the other API // features; the admin-only events log is gated separately in-handler. @@ -71,7 +72,7 @@ var GeneralFeatures = []string{FeatureFineTuning, FeatureQuantization} // APIFeatures lists API endpoint features (default ON). var APIFeatures = []string{ FeatureChat, FeatureImages, FeatureAudioSpeech, FeatureAudioTranscription, - FeatureAudioDiarization, + FeatureAudioDiarization, FeatureAudioClassification, FeatureVAD, FeatureDetection, FeatureVideo, FeatureEmbeddings, FeatureSound, FeatureRealtime, FeatureRerank, FeatureTokenize, FeatureMCP, FeatureStores, FeatureFaceRecognition, FeatureVoiceRecognition, FeatureAudioTransform, diff --git a/core/http/endpoints/localai/agent_collections.go b/core/http/endpoints/localai/agent_collections.go index 17997bcfe..98850d6d1 100644 --- a/core/http/endpoints/localai/agent_collections.go +++ b/core/http/endpoints/localai/agent_collections.go @@ -70,7 +70,7 @@ func UploadToCollectionEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") file, err := c.FormFile("file") if err != nil { return c.JSON(http.StatusBadRequest, map[string]string{"error": "file required"}) @@ -116,7 +116,7 @@ func ListCollectionEntriesEndpoint(app *application.Application) echo.HandlerFun return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - entries, err := svc.ListCollectionEntriesForUser(userID, c.Param("name")) + entries, err := svc.ListCollectionEntriesForUser(userID, decodedParam(c, "name")) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) @@ -139,7 +139,7 @@ func GetCollectionEntryContentEndpoint(app *application.Application) echo.Handle if err != nil { entry = entryParam } - content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, c.Param("name"), entry) + content, chunkCount, err := svc.GetCollectionEntryContentForUser(userID, decodedParam(c, "name"), entry) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) @@ -164,7 +164,7 @@ func SearchCollectionEndpoint(app *application.Application) echo.HandlerFunc { if err := c.Bind(&payload); err != nil { return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()}) } - results, err := svc.SearchCollectionForUser(userID, c.Param("name"), payload.Query, payload.MaxResults) + results, err := svc.SearchCollectionForUser(userID, decodedParam(c, "name"), payload.Query, payload.MaxResults) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) @@ -182,7 +182,7 @@ func ResetCollectionEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - if err := svc.ResetCollectionForUser(userID, c.Param("name")); err != nil { + if err := svc.ResetCollectionForUser(userID, decodedParam(c, "name")); err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) } @@ -202,7 +202,7 @@ func DeleteCollectionEntryEndpoint(app *application.Application) echo.HandlerFun if err := c.Bind(&payload); err != nil { return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()}) } - remaining, err := svc.DeleteCollectionEntryForUser(userID, c.Param("name"), payload.Entry) + remaining, err := svc.DeleteCollectionEntryForUser(userID, decodedParam(c, "name"), payload.Entry) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) @@ -230,7 +230,7 @@ func AddCollectionSourceEndpoint(app *application.Application) echo.HandlerFunc if payload.UpdateInterval < 1 { payload.UpdateInterval = 60 } - if err := svc.AddCollectionSourceForUser(userID, c.Param("name"), payload.URL, payload.UpdateInterval); err != nil { + if err := svc.AddCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL, payload.UpdateInterval); err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) } @@ -250,7 +250,7 @@ func RemoveCollectionSourceEndpoint(app *application.Application) echo.HandlerFu if err := c.Bind(&payload); err != nil { return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()}) } - if err := svc.RemoveCollectionSourceForUser(userID, c.Param("name"), payload.URL); err != nil { + if err := svc.RemoveCollectionSourceForUser(userID, decodedParam(c, "name"), payload.URL); err != nil { return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()}) } return c.JSON(http.StatusOK, map[string]string{"status": "ok"}) @@ -267,7 +267,7 @@ func GetCollectionEntryRawFileEndpoint(app *application.Application) echo.Handle if err != nil { entry = entryParam } - fpath, err := svc.GetCollectionEntryFilePathForUser(userID, c.Param("name"), entry) + fpath, err := svc.GetCollectionEntryFilePathForUser(userID, decodedParam(c, "name"), entry) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) @@ -282,7 +282,7 @@ func ListCollectionSourcesEndpoint(app *application.Application) echo.HandlerFun return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - sources, err := svc.ListCollectionSourcesForUser(userID, c.Param("name")) + sources, err := svc.ListCollectionSourcesForUser(userID, decodedParam(c, "name")) if err != nil { if strings.Contains(err.Error(), "not found") { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) diff --git a/core/http/endpoints/localai/agent_collections_param_test.go b/core/http/endpoints/localai/agent_collections_param_test.go new file mode 100644 index 000000000..e77ab9561 --- /dev/null +++ b/core/http/endpoints/localai/agent_collections_param_test.go @@ -0,0 +1,49 @@ +package localai + +import ( + "net/http" + "net/http/httptest" + + "github.com/labstack/echo/v4" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +// Regression for #10443: agent/collection names carry a "legacy-api-key:" +// prefix, so the ':' is percent-encoded as %3A in the request path. Echo routes +// such paths via URL.RawPath and stores the path-param value still escaped, so +// handlers must URL-decode it before looking the collection up in the store - +// otherwise the lookup sees "legacy-api-key%3ALiteraryResearch" and 404s. +var _ = Describe("decodedParam", func() { + var e *echo.Echo + + BeforeEach(func() { + e = echo.New() + }) + + // route runs a request through Echo's real router so the path param is + // populated exactly as it would be in production, then returns the decoded + // value the handler would observe. + route := func(rawPath string) string { + var got string + e.GET("/api/agents/collections/:name/upload", func(c echo.Context) error { + got = decodedParam(c, "name") + return c.NoContent(http.StatusOK) + }) + req := httptest.NewRequest(http.MethodGet, rawPath, nil) + rec := httptest.NewRecorder() + e.ServeHTTP(rec, req) + Expect(rec.Code).To(Equal(http.StatusOK)) + return got + } + + It("decodes a percent-encoded colon in the collection name", func() { + got := route("/api/agents/collections/legacy-api-key%3ALiteraryResearch/upload") + Expect(got).To(Equal("legacy-api-key:LiteraryResearch")) + }) + + It("leaves an unencoded name untouched", func() { + got := route("/api/agents/collections/PlainCollection/upload") + Expect(got).To(Equal("PlainCollection")) + }) +}) diff --git a/core/http/endpoints/localai/agents.go b/core/http/endpoints/localai/agents.go index 2bf2b3263..fa09b557e 100644 --- a/core/http/endpoints/localai/agents.go +++ b/core/http/endpoints/localai/agents.go @@ -6,6 +6,7 @@ import ( "io" "maps" "net/http" + "net/url" "os" "path/filepath" "slices" @@ -33,6 +34,22 @@ func getUserID(c echo.Context) string { return user.ID } +// decodedParam returns the named path parameter, URL-decoding it. +// +// Echo routes a request via URL.RawPath whenever the path contains +// percent-encoded characters (e.g. %3A for ':'), and in that case stores the +// matched path-param value raw/escaped. Agent and collection names carry a +// "legacy-api-key:" prefix, so the ':' arrives as %3A and the raw param no +// longer matches the stored name. Callers must unescape before lookups. +// Falls back to the raw value if it isn't valid percent-encoding. +func decodedParam(c echo.Context, name string) string { + raw := c.Param(name) + if decoded, err := url.PathUnescape(raw); err == nil { + return decoded + } + return raw +} + // isAdminUser returns true if the authenticated user has admin role. func isAdminUser(c echo.Context) bool { user := auth.GetUser(c) @@ -127,7 +144,7 @@ func GetAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") statuses := svc.ListAgentsForUser(userID) active, exists := statuses[name] @@ -142,7 +159,7 @@ func UpdateAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") var cfg state.AgentConfig if err := c.Bind(&cfg); err != nil { return c.JSON(http.StatusBadRequest, map[string]string{"error": err.Error()}) @@ -161,7 +178,7 @@ func DeleteAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") if err := svc.DeleteAgentForUser(userID, name); err != nil { return c.JSON(http.StatusInternalServerError, map[string]string{"error": err.Error()}) } @@ -173,7 +190,7 @@ func GetAgentConfigEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") cfg := svc.GetAgentConfigForUser(userID, name) if cfg == nil { return c.JSON(http.StatusNotFound, map[string]string{"error": "Agent not found"}) @@ -186,7 +203,7 @@ func PauseAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - if err := svc.PauseAgentForUser(userID, c.Param("name")); err != nil { + if err := svc.PauseAgentForUser(userID, decodedParam(c, "name")); err != nil { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) } return c.JSON(http.StatusOK, map[string]string{"status": "ok"}) @@ -197,7 +214,7 @@ func ResumeAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - if err := svc.ResumeAgentForUser(userID, c.Param("name")); err != nil { + if err := svc.ResumeAgentForUser(userID, decodedParam(c, "name")); err != nil { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) } return c.JSON(http.StatusOK, map[string]string{"status": "ok"}) @@ -208,7 +225,7 @@ func GetAgentStatusEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") history := svc.GetAgentStatusForUser(userID, name) if history == nil { @@ -241,7 +258,7 @@ func GetAgentObservablesEndpoint(app *application.Application) echo.HandlerFunc return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") history, err := svc.GetAgentObservablesForUser(userID, name) if err != nil { @@ -261,7 +278,7 @@ func ClearAgentObservablesEndpoint(app *application.Application) echo.HandlerFun return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") if err := svc.ClearAgentObservablesForUser(userID, name); err != nil { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) } @@ -273,7 +290,7 @@ func ChatWithAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") var payload struct { Message string `json:"message"` } @@ -302,7 +319,7 @@ func AgentSSEEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") // Try local SSE manager first manager := svc.GetSSEManagerForUser(userID, name) @@ -334,7 +351,7 @@ func ExportAgentEndpoint(app *application.Application) echo.HandlerFunc { return func(c echo.Context) error { svc := app.AgentPoolService() userID := effectiveUserID(c) - name := c.Param("name") + name := decodedParam(c, "name") data, err := svc.ExportAgentForUser(userID, name) if err != nil { return c.JSON(http.StatusNotFound, map[string]string{"error": err.Error()}) diff --git a/core/http/endpoints/localai/aliases.go b/core/http/endpoints/localai/aliases.go new file mode 100644 index 000000000..923e22c63 --- /dev/null +++ b/core/http/endpoints/localai/aliases.go @@ -0,0 +1,33 @@ +package localai + +import ( + "net/http" + + "github.com/labstack/echo/v4" + "github.com/mudler/LocalAI/core/config" +) + +// AliasInfo is one alias -> target pair. +type AliasInfo struct { + Name string `json:"name"` + Target string `json:"target"` +} + +// ListAliasesEndpoint returns every configured model alias and its target. +// +// @Summary List model aliases +// @Tags models +// @Success 200 {array} AliasInfo +// @Router /api/aliases [get] +func ListAliasesEndpoint(cl *config.ModelConfigLoader) echo.HandlerFunc { + return func(c echo.Context) error { + // Non-nil so an empty result marshals as [] rather than null. + out := []AliasInfo{} + for _, cfg := range cl.GetAllModelsConfigs() { + if cfg.IsAlias() { + out = append(out, AliasInfo{Name: cfg.Name, Target: cfg.Alias}) + } + } + return c.JSON(http.StatusOK, out) + } +} diff --git a/core/http/endpoints/localai/aliases_test.go b/core/http/endpoints/localai/aliases_test.go new file mode 100644 index 000000000..e1c44898a --- /dev/null +++ b/core/http/endpoints/localai/aliases_test.go @@ -0,0 +1,57 @@ +package localai_test + +import ( + "net/http" + "net/http/httptest" + "os" + "path/filepath" + + "github.com/labstack/echo/v4" + "github.com/mudler/LocalAI/core/config" + . "github.com/mudler/LocalAI/core/http/endpoints/localai" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("ListAliasesEndpoint", func() { + var tempDir string + + BeforeEach(func() { + var err error + tempDir, err = os.MkdirTemp("", "localai-aliases-test") + Expect(err).ToNot(HaveOccurred()) + }) + AfterEach(func() { + _ = os.RemoveAll(tempDir) + }) + + It("returns only alias configs as name/target pairs", func() { + // Seed one real model and one alias pointing at it. + Expect(os.WriteFile( + filepath.Join(tempDir, "real.yaml"), + []byte("name: real\nbackend: llama-cpp\nmodel: foo\n"), + 0644, + )).To(Succeed()) + Expect(os.WriteFile( + filepath.Join(tempDir, "gpt-4.yaml"), + []byte("name: gpt-4\nalias: real\n"), + 0644, + )).To(Succeed()) + + loader := config.NewModelConfigLoader(tempDir) + Expect(loader.LoadModelConfigsFromPath(tempDir)).To(Succeed()) + + app := echo.New() + app.GET("/api/aliases", ListAliasesEndpoint(loader)) + + req := httptest.NewRequest("GET", "/api/aliases", nil) + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + + Expect(rec.Code).To(Equal(http.StatusOK)) + Expect(rec.Body.String()).To(ContainSubstring(`"name":"gpt-4"`)) + Expect(rec.Body.String()).To(ContainSubstring(`"target":"real"`)) + // The real model must not appear as an alias entry. + Expect(rec.Body.String()).ToNot(ContainSubstring(`"name":"real"`)) + }) +}) diff --git a/core/http/endpoints/localai/api_instructions.go b/core/http/endpoints/localai/api_instructions.go index 2ca856a62..405921e5e 100644 --- a/core/http/endpoints/localai/api_instructions.go +++ b/core/http/endpoints/localai/api_instructions.go @@ -32,9 +32,9 @@ var instructionDefs = []instructionDef{ }, { Name: "audio", - Description: "Text-to-speech, voice activity detection, transcription, speaker diarization, and sound generation", + Description: "Text-to-speech, voice activity detection, transcription, speaker diarization, sound classification, and sound generation", Tags: []string{"audio"}, - Intro: "Diarization (/v1/audio/diarization) returns speaker-labelled time segments. Backends with native ASR-diarization (vibevoice-cpp) can also emit per-segment text via include_text=true; backends with a dedicated pipeline (sherpa-onnx + pyannote) emit segmentation only. Response formats: json (default), verbose_json (adds speakers summary + text), rttm (NIST format).", + Intro: "Diarization (/v1/audio/diarization) returns speaker-labelled time segments. Backends with native ASR-diarization (vibevoice-cpp) can also emit per-segment text via include_text=true; backends with a dedicated pipeline (sherpa-onnx + pyannote) emit segmentation only. Response formats: json (default), verbose_json (adds speakers summary + text), rttm (NIST format). Sound classification (/v1/audio/classification) returns scored AudioSet sound-event tags (audio tagging via the ced backend); top_k and threshold control the returned set.", }, { Name: "images", diff --git a/core/http/endpoints/localai/import_model.go b/core/http/endpoints/localai/import_model.go index dc225abdd..54a80a9cc 100644 --- a/core/http/endpoints/localai/import_model.go +++ b/core/http/endpoints/localai/import_model.go @@ -181,6 +181,12 @@ func ImportModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.Applica return c.JSON(http.StatusBadRequest, ModelResponse{Success: false, Error: msg}) } + // Reject aliases whose target is missing, chained, or disabled so a + // dangling alias can't be persisted and surface as a runtime error later. + if err := cl.ValidateAliasTarget(&modelConfig); err != nil { + return c.JSON(http.StatusBadRequest, ModelResponse{Success: false, Error: err.Error()}) + } + // Create the configuration file configPath := filepath.Join(appConfig.SystemState.Model.ModelsPath, modelConfig.Name+".yaml") if err := utils.VerifyPath(modelConfig.Name+".yaml", appConfig.SystemState.Model.ModelsPath); err != nil { diff --git a/core/http/endpoints/localai/nodes.go b/core/http/endpoints/localai/nodes.go index 5a6edab22..d6c44e383 100644 --- a/core/http/endpoints/localai/nodes.go +++ b/core/http/endpoints/localai/nodes.go @@ -70,17 +70,20 @@ func GetNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc { // RegisterNodeRequest is the request body for registering a new worker node. type RegisterNodeRequest struct { - Name string `json:"name"` - NodeType string `json:"node_type,omitempty"` // "backend" (default) or "agent" - Address string `json:"address"` - HTTPAddress string `json:"http_address,omitempty"` - Token string `json:"token,omitempty"` - TotalVRAM uint64 `json:"total_vram,omitempty"` - AvailableVRAM uint64 `json:"available_vram,omitempty"` - TotalRAM uint64 `json:"total_ram,omitempty"` - AvailableRAM uint64 `json:"available_ram,omitempty"` - GPUVendor string `json:"gpu_vendor,omitempty"` - Labels map[string]string `json:"labels,omitempty"` + Name string `json:"name"` + NodeType string `json:"node_type,omitempty"` // "backend" (default) or "agent" + Address string `json:"address"` + HTTPAddress string `json:"http_address,omitempty"` + Token string `json:"token,omitempty"` + TotalVRAM uint64 `json:"total_vram,omitempty"` + AvailableVRAM uint64 `json:"available_vram,omitempty"` + TotalRAM uint64 `json:"total_ram,omitempty"` + AvailableRAM uint64 `json:"available_ram,omitempty"` + GPUVendor string `json:"gpu_vendor,omitempty"` + // GPUComputeCapability is the worker GPU's compute capability ("major.minor", + // e.g. "12.1" for GB10). Used by the router for per-arch option tuning. + GPUComputeCapability string `json:"gpu_compute_capability,omitempty"` + Labels map[string]string `json:"labels,omitempty"` // MaxReplicasPerModel is the per-node cap on replicas of any single model. // Workers older than this field omit it; we coerce 0 → 1 below to preserve // historical single-replica behavior. @@ -152,17 +155,18 @@ func RegisterNodeEndpoint(registry *nodes.NodeRegistry, expectedToken string, au } node := &nodes.BackendNode{ - Name: req.Name, - NodeType: nodeType, - Address: req.Address, - HTTPAddress: req.HTTPAddress, - TokenHash: tokenHash, - TotalVRAM: req.TotalVRAM, - AvailableVRAM: req.AvailableVRAM, - TotalRAM: req.TotalRAM, - AvailableRAM: req.AvailableRAM, - GPUVendor: req.GPUVendor, - MaxReplicasPerModel: maxReplicasPerModel, + Name: req.Name, + NodeType: nodeType, + Address: req.Address, + HTTPAddress: req.HTTPAddress, + TokenHash: tokenHash, + TotalVRAM: req.TotalVRAM, + AvailableVRAM: req.AvailableVRAM, + TotalRAM: req.TotalRAM, + AvailableRAM: req.AvailableRAM, + GPUVendor: req.GPUVendor, + GPUComputeCapability: req.GPUComputeCapability, + MaxReplicasPerModel: maxReplicasPerModel, } ctx := c.Request().Context() @@ -381,6 +385,23 @@ func GetNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc { } } +// ListAllNodeModelsEndpoint returns all loaded models across all healthy nodes. +// @Summary List all loaded models cluster-wide +// @Tags Nodes +// @Success 200 {array} nodes.NodeModel +// @Router /api/nodes/models [get] +func ListAllNodeModelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc { + return func(c echo.Context) error { + ctx := c.Request().Context() + models, err := registry.ListAllLoadedModels(ctx) + if err != nil { + xlog.Error("Failed to list all node models", "error", err) + return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to list node models")) + } + return c.JSON(http.StatusOK, models) + } +} + // DrainNodeEndpoint sets a node to draining status (no new requests). func DrainNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc { return func(c echo.Context) error { diff --git a/core/http/endpoints/localai/nodes_test.go b/core/http/endpoints/localai/nodes_test.go index bca6f42bf..52cef6f03 100644 --- a/core/http/endpoints/localai/nodes_test.go +++ b/core/http/endpoints/localai/nodes_test.go @@ -407,4 +407,44 @@ var _ = Describe("Node HTTP handlers", func() { Expect(names).To(ConsistOf("alpha", "beta")) }) }) + + Describe("ListAllNodeModelsEndpoint", func() { + It("returns an empty list when no models are loaded", func() { + e := echo.New() + req := httptest.NewRequest(http.MethodGet, "/", nil) + rec := httptest.NewRecorder() + c := e.NewContext(req, rec) + + handler := ListAllNodeModelsEndpoint(registry) + Expect(handler(c)).To(Succeed()) + Expect(rec.Code).To(Equal(http.StatusOK)) + + var list []nodes.NodeModel + Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed()) + Expect(list).To(BeEmpty()) + }) + + It("returns loaded models across healthy nodes", func() { + ctx := context.Background() + Expect(registry.Register(ctx, &nodes.BackendNode{ + ID: "n1", Name: "alpha", Address: "10.0.0.1:50051", Status: nodes.StatusHealthy, + }, true)).To(Succeed()) + Expect(registry.SetNodeModel(ctx, "n1", "llama-3.3", 0, "loaded", "10.0.0.1:50051", 0)).To(Succeed()) + + e := echo.New() + req := httptest.NewRequest(http.MethodGet, "/", nil) + rec := httptest.NewRecorder() + c := e.NewContext(req, rec) + + handler := ListAllNodeModelsEndpoint(registry) + Expect(handler(c)).To(Succeed()) + Expect(rec.Code).To(Equal(http.StatusOK)) + + var list []nodes.NodeModel + Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed()) + Expect(list).To(HaveLen(1)) + Expect(list[0].ModelName).To(Equal("llama-3.3")) + Expect(list[0].NodeID).To(Equal("n1")) + }) + }) }) diff --git a/core/http/endpoints/localai/settings.go b/core/http/endpoints/localai/settings.go index 7d970f820..8033f07d5 100644 --- a/core/http/endpoints/localai/settings.go +++ b/core/http/endpoints/localai/settings.go @@ -4,8 +4,6 @@ import ( "encoding/json" "io" "net/http" - "os" - "path/filepath" "time" "github.com/labstack/echo/v4" @@ -110,6 +108,18 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc { }) } + // Read whatever is already persisted: it is both the source of truth + // for branding asset filenames (below) and the base we merge this + // request onto before writing. A read failure must not let a Save + // silently discard the existing settings — surface it instead. + persisted, err := appConfig.ReadPersistedSettings() + if err != nil { + return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{ + Success: false, + Error: "Failed to read existing settings: " + err.Error(), + }) + } + // Branding asset filenames are owned exclusively by // /api/branding/asset/{kind} (upload/delete). The Settings page also // round-trips them via GET /api/settings, but its local state is stale @@ -118,11 +128,9 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc { // at page open. Replace whatever the body sent for these three fields // with the values currently on disk so /api/settings can never // regress them. - if existing, err := appConfig.ReadPersistedSettings(); err == nil { - settings.LogoFile = existing.LogoFile - settings.LogoHorizontalFile = existing.LogoHorizontalFile - settings.FaviconFile = existing.FaviconFile - } + settings.LogoFile = persisted.LogoFile + settings.LogoHorizontalFile = persisted.LogoHorizontalFile + settings.FaviconFile = persisted.FaviconFile // The UI reads ApiKeys from GET /api/settings, which already returns the // merged env+runtime list. When the user clicks Save, the same merged @@ -145,16 +153,17 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc { settings.ApiKeys = &runtimeOnly } - settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json") - settingsJSON, err := json.MarshalIndent(settings, "", " ") - if err != nil { - return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{ - Success: false, - Error: "Failed to marshal settings: " + err.Error(), - }) - } - - if err := os.WriteFile(settingsFile, settingsJSON, 0600); err != nil { + // Persist as a partial update: overlay only the fields this request set + // onto the settings already on disk. Focused admin pages POST just the + // keys they own (the Middleware proxy tab sends only mitm_listen; the + // detector table only pii_default_detectors), so writing the request + // body verbatim would null every unrelated setting (the no-omitempty + // api_keys / pii_default_detectors fields even round-trip as JSON + // null). The full Settings page still round-trips every field, so its + // Save is unchanged. + toPersist := persisted + toPersist.MergeNonNil(settings) + if err := appConfig.WritePersistedSettings(toPersist); err != nil { return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{ Success: false, Error: "Failed to write settings file: " + err.Error(), @@ -262,7 +271,14 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc { } } - if settings.MITMListen != nil { + // Rebuild the MITM listener when its address OR the instance-wide + // default detectors change. The per-host detector map is resolved once + // at listener start (startMITMLocked → ResolvePIIPolicy), so a + // default-detector change is otherwise invisible to cloud-proxy traffic + // until the next restart — an admin toggling a default detector would + // see no redaction. RestartMITM is a no-op when the listener is + // disabled (empty address). + if settings.MITMListen != nil || settings.PIIDefaultDetectors != nil { if err := app.RestartMITM(); err != nil { xlog.Error("Failed to restart MITM proxy", "error", err) return c.JSON(http.StatusInternalServerError, schema.SettingsResponse{ diff --git a/core/http/endpoints/localai/settings_test.go b/core/http/endpoints/localai/settings_test.go index 25c84e1b7..3974c5045 100644 --- a/core/http/endpoints/localai/settings_test.go +++ b/core/http/endpoints/localai/settings_test.go @@ -52,6 +52,10 @@ var _ = Describe("Settings endpoints", func() { // Settings are persisted here; set after construction since there's no // dedicated AppOption for it. app.ApplicationConfig().DynamicConfigsDir = tmp + // Contain the MITM CA inside tmp too. The partial-save spec flips + // mitm_listen, which starts the listener and writes a CA; without this + // it defaults to ./mitm-ca and litters the package source tree. + app.ApplicationConfig().MITMCADir = filepath.Join(tmp, "mitm-ca") e = echo.New() e.GET("/api/settings", GetSettingsEndpoint(app)) @@ -109,6 +113,57 @@ var _ = Describe("Settings endpoints", func() { Expect(err).ToNot(HaveOccurred()) }) + // Regression: a focused admin page (the Middleware proxy tab) POSTs only + // the one field it owns — mitm_listen. The old handler wrote the request + // body verbatim, so every other persisted setting was dropped (and + // api_keys / pii_default_detectors, which lack omitempty, were written as + // null). A partial POST must now merge onto what is already on disk. + It("preserves unrelated persisted settings when a partial POST sets only mitm_listen", func() { + // First save establishes a fuller settings file (as the full Settings + // page would): galleries, an API key, and the MITM listener. The + // listener restart binds a real socket, so use 127.0.0.1:0 for an + // ephemeral free port rather than a fixed one that may be in use. + rec := post(`{"mitm_listen":"127.0.0.1:0","galleries":[{"name":"g1","url":"http://example/g1"}],"api_keys":["k1"],"pii_default_detectors":["det-a"]}`) + Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String()) + + // The Middleware proxy tab then changes only the listen address — the + // exact partial body that nulled everything else before the fix. + rec = post(`{"mitm_listen":"127.0.0.1:0"}`) + Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String()) + + raw, err := os.ReadFile(filepath.Join(tmp, "runtime_settings.json")) + Expect(err).ToNot(HaveOccurred()) + var ondisk config.RuntimeSettings + Expect(json.Unmarshal(raw, &ondisk)).To(Succeed()) + + Expect(ondisk.MITMListen).ToNot(BeNil()) + Expect(*ondisk.MITMListen).To(Equal("127.0.0.1:0"), "the changed field should be saved") + Expect(ondisk.Galleries).ToNot(BeNil(), "galleries were clobbered by the partial save") + Expect(*ondisk.Galleries).To(HaveLen(1)) + Expect(ondisk.ApiKeys).ToNot(BeNil(), "api_keys were nulled by the partial save") + Expect(*ondisk.ApiKeys).To(Equal([]string{"k1"})) + Expect(ondisk.PIIDefaultDetectors).ToNot(BeNil(), "pii_default_detectors were nulled by the partial save") + Expect(*ondisk.PIIDefaultDetectors).To(Equal([]string{"det-a"})) + }) + + // The MITM listener resolves its per-host PII detectors once at start + // (startMITMLocked → ResolvePIIPolicy), and the handler used to restart it + // only when mitm_listen changed. So an admin toggling a default detector + // (the Middleware detector table POSTs only pii_default_detectors) left + // cloud-proxy traffic unredacted until the next reboot. A + // pii_default_detectors change must now rebuild the listener. + It("rebuilds the MITM listener when only pii_default_detectors changes", func() { + rec := post(`{"mitm_listen":"127.0.0.1:0"}`) + Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String()) + srv1 := app.MITMServer() + Expect(srv1).ToNot(BeNil(), "listener should be running after mitm_listen is set") + + rec = post(`{"pii_default_detectors":["det-a"]}`) + Expect(rec.Code).To(Equal(http.StatusOK), rec.Body.String()) + Expect(app.MITMServer()).ToNot(BeIdenticalTo(srv1), + "a default-detector change must restart the listener so it picks up the new detectors") + }) + // Residual #9125: enabling the watchdog from a cold (off) state via the // React master toggle must start the live watchdog immediately, without a // restart. The toggle posts watchdog_idle_enabled/busy_enabled=true while diff --git a/core/http/endpoints/mcp/localai_assistant_test.go b/core/http/endpoints/mcp/localai_assistant_test.go index 26cd2878f..8de7355c6 100644 --- a/core/http/endpoints/mcp/localai_assistant_test.go +++ b/core/http/endpoints/mcp/localai_assistant_test.go @@ -51,6 +51,12 @@ func (stubClient) EditModelConfig(_ context.Context, _ string, _ map[string]any) return nil } func (stubClient) ReloadModels(_ context.Context) error { return nil } +func (stubClient) SetAlias(_ context.Context, _, _ string) error { + return nil +} +func (stubClient) ListAliases(_ context.Context) ([]localaitools.AliasInfo, error) { + return nil, nil +} func (stubClient) ListBackends(_ context.Context) ([]localaitools.Backend, error) { return []localaitools.Backend{{Name: "stub-backend", Installed: true}}, nil } diff --git a/core/http/endpoints/openai/realtime.go b/core/http/endpoints/openai/realtime.go index 343ef4c07..d4d6a0ac4 100644 --- a/core/http/endpoints/openai/realtime.go +++ b/core/http/endpoints/openai/realtime.go @@ -12,6 +12,7 @@ import ( "os" "strconv" "sync" + "sync/atomic" "time" "net/http" @@ -93,16 +94,31 @@ type Session struct { Voice string TurnDetection *types.TurnDetectionUnion // "server_vad", "semantic_vad" or "none" InputAudioTranscription *types.AudioTranscription - Tools []types.ToolUnion - ToolChoice *types.ToolChoiceUnion - Conversations map[string]*Conversation - InputAudioBuffer []byte - AudioBufferLock sync.Mutex - OpusFrames [][]byte - OpusFramesLock sync.Mutex - Instructions string - DefaultConversationID string - ModelInterface Model + + // SoundDetectionEnabled is set when pipeline.sound_detection names a + // sound-event-classification model. When true, each committed utterance is + // also run through ModelInterface.SoundDetection and the scored tags are + // emitted as a conversation.item.sound_detection event. SoundDetectionTopK + // and SoundDetectionThreshold are the knobs passed to that call (defaults: + // top_k=5, threshold=0). + SoundDetectionEnabled bool + SoundDetectionTopK int + SoundDetectionThreshold float32 + // SoundDetectionWindowMs / SoundDetectionHopMs, when both > 0, enable + // server-side windowing for a sound-only session: the server classifies the + // last WindowMs of streamed audio every HopMs (no client commits needed). + SoundDetectionWindowMs int + SoundDetectionHopMs int + Tools []types.ToolUnion + ToolChoice *types.ToolChoiceUnion + Conversations map[string]*Conversation + InputAudioBuffer []byte + AudioBufferLock sync.Mutex + OpusFrames [][]byte + OpusFramesLock sync.Mutex + Instructions string + DefaultConversationID string + ModelInterface Model // The pipeline model config or the config for an any-to-any model ModelConfig *config.ModelConfig InputSampleRate int @@ -119,6 +135,18 @@ type Session struct { // pairs are kept together so we never feed an orphaned tool result. MaxHistoryItems int + // Compaction settings resolved from pipeline.compaction (see resolveCompaction). + CompactionEnabled bool + CompactionTrigger int + SummaryModel string + MaxSummaryTokens int + + // summarizerFactory lazily builds the model used for compaction summaries + // when summary_model is configured; nil means reuse the pipeline LLM. + summarizerFactory func() (Model, error) + summarizerOnce sync.Once + summarizerCached Model + // AssistantExecutor is non-nil when the session opted into the in-process // LocalAI Assistant tool surface. Tool calls whose name matches this // executor's catalog are run inproc and their output is fed back to the @@ -226,6 +254,12 @@ type Conversation struct { ID string Items []*types.MessageItemUnion Lock sync.Mutex + // Memory is the rolling summary of items already evicted by compaction. It + // is kept out of Items (so trimRealtimeItems never drops it) and rendered + // as a system message right after the session instructions. + Memory string + // compacting ensures at most one background compaction runs per conversation. + compacting atomic.Bool } func (c *Conversation) ToServer() types.Conversation { @@ -250,6 +284,10 @@ type Model interface { // TranscribeStream transcribes audio incrementally, invoking onDelta for each // transcript text fragment and returning the final aggregated result. TranscribeStream(ctx context.Context, audio, language string, translate, diarize bool, prompt string, onDelta func(text string)) (*schema.TranscriptionResult, error) + // SoundDetection classifies a committed audio window into scored AudioSet + // sound-event tags. topK caps the number of returned tags (0 = backend + // default), threshold drops tags below the given score (0 = keep all). + SoundDetection(ctx context.Context, audio string, topK int, threshold float32) (*schema.SoundClassificationResult, error) PredictConfig() *config.ModelConfig } @@ -399,7 +437,7 @@ func prepareRealtimeConfig(cfg *config.ModelConfig) (errCode, errMsg string, ok return "", "", true } - if cfg.Pipeline.VAD == "" && cfg.Pipeline.Transcription == "" && cfg.Pipeline.TTS == "" && cfg.Pipeline.LLM == "" { + if cfg.Pipeline.VAD == "" && cfg.Pipeline.Transcription == "" && cfg.Pipeline.TTS == "" && cfg.Pipeline.LLM == "" && cfg.Pipeline.SoundDetection == "" { return "invalid_model", "Model is not a pipeline model", false } return "", "", true @@ -469,6 +507,26 @@ func runRealtimeSession(application *application.Application, t Transport, model sttModel := cfg.Pipeline.Transcription + // A sound-detection-only pipeline (sound_detection set, no transcription/LLM) + // activates on sounds, not speech, so it runs WITHOUT the voice VAD: the + // session defaults to turn_detection none and the client drives windowing via + // input_audio_buffer.commit. There is no transcription stage in that case. + soundOnly := cfg.Pipeline.SoundDetection != "" && cfg.Pipeline.Transcription == "" && cfg.Pipeline.LLM == "" + + turnDetection := &types.TurnDetectionUnion{ + ServerVad: &types.ServerVad{ + Threshold: 0.5, + PrefixPaddingMs: 300, + SilenceDurationMs: 500, + CreateResponse: true, + }, + } + inputAudioTranscription := &types.AudioTranscription{Model: sttModel} + if soundOnly { + turnDetection = nil // turn_detection none: no VAD + inputAudioTranscription = nil // no transcription stage + } + // Compose the system prompt: prepend the assistant prompt when we have // one (it teaches the model the safety rules and tool recipes), then the // session's default voice instructions. Order matches chat.go's @@ -480,51 +538,56 @@ func runRealtimeSession(application *application.Application, t Transport, model sessionID := generateSessionID() session := &Session{ - ID: sessionID, - TranscriptionOnly: false, - Model: model, - Voice: cfg.TTSConfig.Voice, - Instructions: instructions, - ModelConfig: cfg, - Tools: assistantTools, - AssistantTools: assistantTools, - AssistantExecutor: assistantExecutor, - TurnDetection: &types.TurnDetectionUnion{ - ServerVad: &types.ServerVad{ - Threshold: 0.5, - PrefixPaddingMs: 300, - SilenceDurationMs: 500, - CreateResponse: true, - }, - }, - InputAudioTranscription: &types.AudioTranscription{ - Model: sttModel, - }, - Conversations: make(map[string]*Conversation), - InputSampleRate: defaultRemoteSampleRate, - OutputSampleRate: defaultRemoteSampleRate, - MaxHistoryItems: resolveMaxHistoryItems(cfg), + ID: sessionID, + TranscriptionOnly: false, + Model: model, + Voice: cfg.TTSConfig.Voice, + Instructions: instructions, + ModelConfig: cfg, + Tools: assistantTools, + AssistantTools: assistantTools, + AssistantExecutor: assistantExecutor, + TurnDetection: turnDetection, + InputAudioTranscription: inputAudioTranscription, + Conversations: make(map[string]*Conversation), + InputSampleRate: defaultRemoteSampleRate, + OutputSampleRate: defaultRemoteSampleRate, + MaxHistoryItems: resolveMaxHistoryItems(cfg), + SoundDetectionEnabled: cfg.Pipeline.SoundDetection != "", + SoundDetectionTopK: defaultSoundDetectionTopK, + SoundDetectionThreshold: 0, + SoundDetectionWindowMs: cfg.Pipeline.SoundDetectionWindowMs, + SoundDetectionHopMs: cfg.Pipeline.SoundDetectionHopMs, } + session.CompactionEnabled, session.CompactionTrigger, session.MaxSummaryTokens, session.SummaryModel = resolveCompaction(cfg, session.MaxHistoryItems) // Create a default conversation conversationID := generateConversationID() conversation := &Conversation{ - ID: conversationID, - // TODO: We need to truncate the conversation items when a new item is added and we have run out of space. There are multiple places where items - // can be added so we could use a datastructure here that enforces truncation upon addition + ID: conversationID, Items: []*types.MessageItemUnion{}, } session.Conversations[conversationID] = conversation session.DefaultConversationID = conversationID - m, err := newModel( - &cfg.Pipeline, - application.ModelConfigLoader(), - application.ModelLoader(), - application.ApplicationConfig(), - evaluator, - buildRealtimeRoutingContext(application, sessionID), - ) + var m Model + if soundOnly { + m, err = newSoundDetectionOnlyModel( + &cfg.Pipeline, + application.ModelConfigLoader(), + application.ModelLoader(), + application.ApplicationConfig(), + ) + } else { + m, err = newModel( + &cfg.Pipeline, + application.ModelConfigLoader(), + application.ModelLoader(), + application.ApplicationConfig(), + evaluator, + buildRealtimeRoutingContext(application, sessionID), + ) + } if err != nil { xlog.Error("failed to load model", "error", err) sendError(t, "model_load_error", "Failed to load model", "", "") @@ -532,6 +595,18 @@ func runRealtimeSession(application *application.Application, t Transport, model } session.ModelInterface = m + if session.SummaryModel != "" { + summaryModelName := session.SummaryModel + sid := sessionID + session.summarizerFactory = func() (Model, error) { + summaryCfg, lerr := application.ModelConfigLoader().LoadModelConfigFileByNameDefaultOptions(summaryModelName, application.ApplicationConfig()) + if lerr != nil { + return nil, fmt.Errorf("load summary model config %q: %w", summaryModelName, lerr) + } + return newModel(&summaryCfg.Pipeline, application.ModelConfigLoader(), application.ModelLoader(), application.ApplicationConfig(), evaluator, buildRealtimeRoutingContext(application, sid)) + } + } + if cfg.Pipeline.VoiceGateEnabled() { gate, gerr := newVoiceGate( *cfg.Pipeline.VoiceRecognition, @@ -605,6 +680,20 @@ func runRealtimeSession(application *application.Application, t Transport, model toggleVAD() + // Server-side sound-detection windowing (option B): for a sound-only session + // with window/hop configured, the server classifies the last window of + // streamed audio on a timer, so the client only has to stream (no commits). + // This runs independent of VAD (sound events are not speech). + var soundWindowDone chan struct{} + if soundOnly && session.SoundDetectionWindowMs > 0 && session.SoundDetectionHopMs > 0 { + soundWindowDone = make(chan struct{}) + wg.Go(func() { + handleSoundWindow(session, t, soundWindowDone) + }) + xlog.Debug("Starting server-side sound-detection windowing", + "window_ms", session.SoundDetectionWindowMs, "hop_ms", session.SoundDetectionHopMs) + } + for { msg, err = t.ReadEvent() if err != nil { @@ -748,6 +837,15 @@ func runRealtimeSession(application *application.Application, t Transport, model commitUtterance(respCtx, allAudio, session, conversation, t) }() + case types.InputAudioBufferClearEvent: + xlog.Debug("recv", "message", string(msg)) + // Discard a partially-captured utterance so the client can restart + // input cleanly without the stale buffer leaking into the next commit. + clearInputAudio(session) + sendEvent(t, types.InputAudioBufferClearedEvent{ + ServerEventBase: types.ServerEventBase{EventID: e.EventID}, + }) + case types.ConversationItemCreateEvent: xlog.Debug("recv", "message", string(msg)) // Add the item to the conversation @@ -782,7 +880,39 @@ func runRealtimeSession(application *application.Application, t Transport, model }) case types.ConversationItemDeleteEvent: - sendError(t, "not_implemented", "Deleting items not implemented", "", "event_TODO") + xlog.Debug("recv", "message", string(msg)) + if e.ItemID == "" { + sendError(t, "invalid_item_id", "Need item_id, but none specified", "", "event_TODO") + continue + } + conversation.Lock.Lock() + updated, ok := deleteItem(conversation.Items, e.ItemID) + conversation.Items = updated + conversation.Lock.Unlock() + if !ok { + sendError(t, "invalid_item_id", "Item to delete not found", "", "event_TODO") + continue + } + sendEvent(t, types.ConversationItemDeletedEvent{ + ServerEventBase: types.ServerEventBase{EventID: e.EventID}, + ItemID: e.ItemID, + }) + + case types.ConversationItemTruncateEvent: + xlog.Debug("recv", "message", string(msg)) + conversation.Lock.Lock() + ok := truncateAssistantText(conversation.Items, e.ItemID, e.ContentIndex) + conversation.Lock.Unlock() + if !ok { + sendError(t, "invalid_item_id", "Item to truncate not found", "", "event_TODO") + continue + } + sendEvent(t, types.ConversationItemTruncatedEvent{ + ServerEventBase: types.ServerEventBase{EventID: e.EventID}, + ItemID: e.ItemID, + ContentIndex: e.ContentIndex, + AudioEndMs: e.AudioEndMs, + }) case types.ConversationItemRetrieveEvent: xlog.Debug("recv", "message", string(msg)) @@ -795,21 +925,7 @@ func runRealtimeSession(application *application.Application, t Transport, model conversation.Lock.Lock() var retrievedItem types.MessageItemUnion for _, item := range conversation.Items { - // We need to check ID in the union - var id string - if item.System != nil { - id = item.System.ID - } else if item.User != nil { - id = item.User.ID - } else if item.Assistant != nil { - id = item.Assistant.ID - } else if item.FunctionCall != nil { - id = item.FunctionCall.ID - } else if item.FunctionCallOutput != nil { - id = item.FunctionCallOutput.ID - } - - if id == e.ItemID { + if itemID(item) == e.ItemID { retrievedItem = *item break } @@ -880,6 +996,10 @@ func runRealtimeSession(application *application.Application, t Transport, model if vadServerStarted { close(done) } + // Stop the server-side sound-detection windowing goroutine (if running). + if soundWindowDone != nil { + close(soundWindowDone) + } wg.Wait() // Remove the session from the sessions map @@ -971,6 +1091,10 @@ func updateTransSession(session *Session, update *types.SessionUnion, cl *config session.ModelInterface = m session.ModelConfig = cfg + session.SoundDetectionEnabled = cfg.Pipeline.SoundDetection != "" + if session.SoundDetectionTopK <= 0 { + session.SoundDetectionTopK = defaultSoundDetectionTopK + } } if trUpd != nil { @@ -1311,35 +1435,40 @@ func commitUtterance(ctx context.Context, utt []byte, session *Session, conv *Co // turn wastes only transcription compute, which has no side effects. The // transcript is still emitted to the same peer that sent the audio, which // reveals nothing new to them. - type gateOutcome struct { - allowed bool - matched string - reason string - err error + // Resolve the speaker when the gate must authorize this turn, or when identity + // surfacing/personalization needs a fresh identity. Identity resolution + // ignores the when:first short-circuit (that only skips re-authorization). + type resolveOutcome struct { + res resolution + err error } - var gateCh chan gateOutcome - runGate := false + var resolveCh chan resolveOutcome + runResolve := false if session.voiceGate != nil && session.InputAudioTranscription != nil { - skip := false - if session.voiceGate.cfg.When == config.VoiceGateWhenFirst { + enforce := session.voiceGate.cfg.EnforceGate() + gateNeedsAuth := enforce + if enforce && session.voiceGate.cfg.When == config.VoiceGateWhenFirst { session.gateMu.Lock() - skip = session.voiceVerified + if session.voiceVerified { + gateNeedsAuth = false + } session.gateMu.Unlock() } - if !skip { - runGate = true - gateCh = make(chan gateOutcome, 1) + if gateNeedsAuth || session.voiceGate.cfg.IdentityEnabled() { + runResolve = true + resolveCh = make(chan resolveOutcome, 1) wavPath := f.Name() go func() { - allowed, matched, reason, gerr := session.voiceGate.Authorize(ctx, wavPath) - gateCh <- gateOutcome{allowed: allowed, matched: matched, reason: reason, err: gerr} + r, rerr := session.voiceGate.Resolve(ctx, wavPath) + resolveCh <- resolveOutcome{res: r, err: rerr} }() } } // TODO: If we have a real any-to-any model then transcription is optional var transcript string - if session.InputAudioTranscription != nil { + switch { + case session.InputAudioTranscription != nil: // emitTranscription streams transcript deltas when // pipeline.streaming.transcription is set, otherwise emits a single // completed event; either way it returns the final transcript text. @@ -1348,55 +1477,169 @@ func commitUtterance(ctx context.Context, utt []byte, session *Session, conv *Co if err != nil { // Drain the gate goroutine before returning so its in-flight read of // the temp WAV finishes before the deferred os.Remove fires. - if runGate { - <-gateCh + if runResolve { + <-resolveCh } sendError(t, "transcription_failed", err.Error(), "", "event_TODO") return } - } else { + case session.SoundDetectionEnabled: + // Sound-detection-only session: no transcription and no LLM. The + // sound-detection emit below carries the result; there is no any-to-any + // path to fall into. Windowing is client-driven (turn_detection none + + // input_audio_buffer.commit), so this is not voice-gated. + default: // The voice gate runs only on the transcription path above; if an // any-to-any model path is added here, join the gate before responding. sendNotImplemented(t, "any-to-any models") return } - // Join on the gate before any side-effecting step. - if runGate { - out := <-gateCh - allowed := out.allowed - reason := out.reason - if out.err != nil { - // Fail closed: a gate that cannot decide must not let audio through. - xlog.Error("voice recognition gate error", "error", out.err) - allowed = false - reason = "verification error" - } - alreadyVerified := false - if session.voiceGate.cfg.When == config.VoiceGateWhenFirst { - session.gateMu.Lock() - alreadyVerified = session.voiceVerified - session.gateMu.Unlock() - } - proceed, markVerified := session.voiceGate.decide(alreadyVerified, allowed) - if !proceed { - xlog.Debug("voice recognition gate rejected utterance", "reason", reason) - if session.voiceGate.cfg.OnReject == config.VoiceGateRejectEvent { - sendError(t, "speaker_not_authorized", "speaker not authorized: "+reason, "", "event_TODO") - } - return - } - xlog.Debug("voice recognition gate authorized utterance", "speaker", out.matched) - if markVerified { - session.gateMu.Lock() - session.voiceVerified = true - session.gateMu.Unlock() + // Sound-event detection is additive to transcription: classify the same + // committed window and emit its scored AudioSet tags as a separate event. + // A failure here is logged but must never abort the turn. + if session.SoundDetectionEnabled { + if sderr := emitSoundDetection(ctx, t, session, generateItemID(), f.Name()); sderr != nil { + xlog.Error("sound detection failed", "error", sderr) } } - if !session.TranscriptionOnly { - generateResponse(ctx, session, utt, transcript, conv, t) + // Join on the resolution before any side-effecting step. + var speaker *types.Speaker + if runResolve { + out := <-resolveCh + enforce := session.voiceGate.cfg.EnforceGate() + + if out.err != nil { + if enforce { + // Fail closed: a gate that cannot decide must not let audio through. + xlog.Error("voice recognition gate error", "error", out.err) + if session.voiceGate.cfg.OnReject == config.VoiceGateRejectEvent { + sendError(t, "speaker_not_authorized", "speaker not authorized: verification error", "", "event_TODO") + } + return + } + // Non-enforcing: degrade to an unknown speaker and continue. + xlog.Warn("voice identity resolve failed; continuing as unknown speaker", "error", out.err) + } else { + s := out.res.speaker + speaker = &s + } + + if enforce { + alreadyVerified := false + if session.voiceGate.cfg.When == config.VoiceGateWhenFirst { + session.gateMu.Lock() + alreadyVerified = session.voiceVerified + session.gateMu.Unlock() + } + allowed, reason := false, "verification error" + if out.err == nil { + allowed, reason = session.voiceGate.authorize(out.res) + } + proceed, markVerified := session.voiceGate.decide(alreadyVerified, allowed) + if !proceed { + xlog.Debug("voice recognition gate rejected utterance", "reason", reason) + if session.voiceGate.cfg.OnReject == config.VoiceGateRejectEvent { + sendError(t, "speaker_not_authorized", "speaker not authorized: "+reason, "", "event_TODO") + } + return + } + if markVerified { + session.gateMu.Lock() + session.voiceVerified = true + session.gateMu.Unlock() + } + xlog.Debug("voice recognition gate authorized utterance", "speaker", out.res.speaker.Name) + } } + + // Generate an LLM response only when there is a transcript to feed it. A + // sound-detection-only session (no transcription) has no LLM stage, so it + // stops here after emitting the sound-detection event. + if session.InputAudioTranscription != nil && !session.TranscriptionOnly { + generateResponse(ctx, session, utt, transcript, speaker, conv, t) + } +} + +// handleSoundWindow runs server-side windowed sound-event detection (option B): +// every HopMs it classifies the last WindowMs of streamed audio and emits a +// sound_detection event, so a sound-only client only has to stream audio (no +// input_audio_buffer.commit). It keeps the input buffer trimmed to one window +// so a long stream stays bounded. Runs until done is closed. This is +// independent of VAD: sound events are not speech. +func handleSoundWindow(session *Session, t Transport, done chan struct{}) { + ticker := time.NewTicker(time.Duration(session.SoundDetectionHopMs) * time.Millisecond) + defer ticker.Stop() + + for { + select { + case <-done: + return + case <-ticker.C: + classifySoundWindow(session, t) + } + } +} + +// classifySoundWindow is one windowing tick: it snapshots the most recent +// WindowMs of buffered audio (trimming the buffer so a long stream stays +// bounded) and, when there is enough, classifies it and emits a sound_detection +// event. Extracted from handleSoundWindow so it can be driven synchronously in +// tests. +func classifySoundWindow(session *Session, t Transport) { + const bytesPerSample = 2 // 16-bit mono PCM + sr := session.InputSampleRate + windowBytes := session.SoundDetectionWindowMs * sr / 1000 * bytesPerSample + minBytes := sr / 100 * bytesPerSample // ~10ms before classifying + + session.AudioBufferLock.Lock() + // Keep only the most recent window so a long stream stays bounded. + if windowBytes > 0 && len(session.InputAudioBuffer) > windowBytes { + trimmed := make([]byte, windowBytes) + copy(trimmed, session.InputAudioBuffer[len(session.InputAudioBuffer)-windowBytes:]) + session.InputAudioBuffer = trimmed + } + window := make([]byte, len(session.InputAudioBuffer)) + copy(window, session.InputAudioBuffer) + session.AudioBufferLock.Unlock() + + if len(window) < minBytes { + return // not enough audio buffered yet + } + path, err := writeWindowWAV(window, sr) + if err != nil { + xlog.Error("sound window: failed to write wav", "error", err) + return + } + if sderr := emitSoundDetection(context.Background(), t, session, generateItemID(), path); sderr != nil { + xlog.Error("sound window: detection failed", "error", sderr) + } + if rerr := os.Remove(path); rerr != nil { + xlog.Debug("sound window: temp cleanup failed", "error", rerr) + } +} + +// writeWindowWAV writes mono 16-bit PCM to a temp WAV at the given sample rate +// (the ced classifier reads the declared rate and resamples). Returns the path; +// the caller removes it. +func writeWindowWAV(pcm []byte, sampleRate int) (string, error) { + f, err := os.CreateTemp("", "realtime-sound-window-*.wav") + if err != nil { + return "", err + } + defer func() { _ = f.Close() }() + hdr := laudio.NewWAVHeaderWithRate(uint32(len(pcm)), uint32(sampleRate)) + if err := hdr.Write(f); err != nil { + _ = os.Remove(f.Name()) + return "", err + } + if _, err := f.Write(pcm); err != nil { + _ = os.Remove(f.Name()) + return "", err + } + _ = f.Sync() + return f.Name(), nil } func runVAD(ctx context.Context, session *Session, adata []int16) ([]schema.VADSegment, error) { @@ -1419,15 +1662,28 @@ func runVAD(ctx context.Context, session *Session, adata []int16) ([]schema.VADS return resp.Segments, nil } +// speakerNote renders the system-prompt note for the current speaker. Returns +// an empty string when there is no name and unknown notes are disabled. +func speakerNote(s *types.Speaker, noteUnknown bool) string { + if s != nil && s.Matched && s.Name != "" { + return "The current speaker is " + s.Name + "." + } + if noteUnknown { + return "The current speaker is unknown." + } + return "" +} + // Function to generate a response based on the conversation -func generateResponse(ctx context.Context, session *Session, utt []byte, transcript string, conv *Conversation, t Transport) { +func generateResponse(ctx context.Context, session *Session, utt []byte, transcript string, speaker *types.Speaker, conv *Conversation, t Transport) { xlog.Debug("Generating realtime response...") // Create user message item item := types.MessageItemUnion{ User: &types.MessageItemUser{ - ID: generateItemID(), - Status: types.ItemStatusCompleted, + ID: generateItemID(), + Status: types.ItemStatusCompleted, + Speaker: speaker, Content: []types.MessageContentInput{ { Type: types.MessageContentTypeInputAudio, @@ -1445,6 +1701,17 @@ func generateResponse(ctx context.Context, session *Session, utt []byte, transcr Item: item, }) + // Surface the recognized speaker to the client. Skip the event for an + // unidentified speaker unless announce_unknown is set. + if speaker != nil && session.voiceGate != nil && session.voiceGate.cfg.AnnounceEnabled() { + if speaker.Matched || session.voiceGate.cfg.Identity.AnnounceUnknown { + sendEvent(t, types.ConversationItemSpeakerEvent{ + ItemID: item.User.ID, + Speaker: *speaker, + }) + } + } + triggerResponse(ctx, session, conv, t, nil) } @@ -1456,6 +1723,9 @@ const maxAssistantToolTurns = 10 func triggerResponse(ctx context.Context, session *Session, conv *Conversation, t Transport, overrides *types.ResponseCreateParams) { triggerResponseAtTurn(ctx, session, conv, t, overrides, 0) + // Fold aged-out turns into the rolling memory off the critical path; the + // next turn reaps the smaller buffer. + session.maybeCompact(conv) } func triggerResponseAtTurn(ctx context.Context, session *Session, conv *Conversation, t Transport, overrides *types.ResponseCreateParams, toolTurn int) { @@ -1508,13 +1778,21 @@ func triggerResponseAtTurn(ctx context.Context, session *Session, conv *Conversa }) imgIndex := 0 + var lastUserSpeaker *types.Speaker + personalize := session.voiceGate != nil && session.voiceGate.cfg.PersonalizeEnabled() conv.Lock.Lock() + conversationHistory = withMemory(conversationHistory, conv.Memory) items := trimRealtimeItems(conv.Items, session.MaxHistoryItems) for _, item := range items { if item.User != nil { msg := schema.Message{ Role: string(types.MessageRoleUser), } + lastUserSpeaker = item.User.Speaker + if personalize && session.voiceGate.cfg.Identity.InjectName && + item.User.Speaker != nil && item.User.Speaker.Matched && item.User.Speaker.Name != "" { + msg.Name = item.User.Speaker.Name + } textContent := "" nrOfImgsInMessage := 0 for _, content := range item.User.Content { @@ -1601,6 +1879,13 @@ func triggerResponseAtTurn(ctx context.Context, session *Session, conv *Conversa } conv.Lock.Unlock() + if personalize && session.voiceGate.cfg.Identity.InjectSystemNote { + if note := speakerNote(lastUserSpeaker, session.voiceGate.cfg.Identity.NoteUnknown); note != "" { + conversationHistory[0].StringContent += "\n\n" + note + conversationHistory[0].Content = conversationHistory[0].StringContent + } + } + var images []string for _, m := range conversationHistory { images = append(images, m.StringImages...) diff --git a/core/http/endpoints/openai/realtime_compaction.go b/core/http/endpoints/openai/realtime_compaction.go new file mode 100644 index 000000000..f79a2d7a2 --- /dev/null +++ b/core/http/endpoints/openai/realtime_compaction.go @@ -0,0 +1,326 @@ +package openai + +import ( + "context" + "fmt" + "strings" + "time" + + "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" + "github.com/mudler/LocalAI/core/schema" + "github.com/mudler/LocalAI/pkg/reasoning" + "github.com/mudler/xlog" +) + +const ( + defaultMaxSummaryTokens = 512 + memoryPrefix = "Summary of earlier conversation:\n" + // compactionTimeout bounds the summarizer call so a stuck model can't pin the + // compacting flag (and thus block all further compaction) forever. + compactionTimeout = 60 * time.Second +) + +// withMemory inserts the rolling summary as a system message after the existing +// (instructions) history. No-op when memory is empty. +func withMemory(history schema.Messages, memory string) schema.Messages { + if memory == "" { + return history + } + content := memoryPrefix + memory + return append(history, schema.Message{ + Role: string(types.MessageRoleSystem), + StringContent: content, + Content: content, + }) +} + +// renderItemsTranscript renders conversation items as a plain "role: text" +// transcript for summarization. Non-text items (bare tool calls) are labelled +// so the summarizer keeps track of actions taken. +func renderItemsTranscript(items []*types.MessageItemUnion) string { + var b strings.Builder + for _, item := range items { + switch { + case item.User != nil: + b.WriteString("user: ") + for _, c := range item.User.Content { + if c.Text != "" { + b.WriteString(c.Text) + } + if c.Transcript != "" { + b.WriteString(c.Transcript) + } + } + b.WriteString("\n") + case item.Assistant != nil: + b.WriteString("assistant: ") + // Realtime assistant *audio* turns store the spoken words in + // .Transcript (not .Text), so emit both or spoken turns are dropped. + for _, c := range item.Assistant.Content { + if c.Text != "" { + b.WriteString(c.Text) + } + if c.Transcript != "" { + b.WriteString(c.Transcript) + } + } + b.WriteString("\n") + case item.FunctionCall != nil: + b.WriteString(fmt.Sprintf("assistant called tool %s(%s)\n", item.FunctionCall.Name, item.FunctionCall.Arguments)) + case item.FunctionCallOutput != nil: + b.WriteString(fmt.Sprintf("tool result: %s\n", item.FunctionCallOutput.Output)) + } + } + return strings.TrimSpace(b.String()) +} + +// buildSummaryMessages builds the chat messages for the summarizer LLM: a system +// instruction plus prior memory and the new transcript to fold in. maxTokens is +// advisory (fed to the prompt; not hard-enforced in v1). +func buildSummaryMessages(priorMemory, transcript string, maxTokens int) schema.Messages { + system := fmt.Sprintf("You maintain a running memory of a live voice conversation. "+ + "Merge the prior memory with the new exchanges into an updated memory. "+ + "Keep names, decisions, facts, preferences, and open threads. Be concise "+ + "(under ~%d tokens). Output only the updated memory, with no reasoning or tags.", maxTokens) + var user strings.Builder + if priorMemory != "" { + user.WriteString("Prior memory:\n") + user.WriteString(priorMemory) + user.WriteString("\n\n") + } + user.WriteString("New exchanges to fold in:\n") + user.WriteString(transcript) + return schema.Messages{ + {Role: string(types.MessageRoleSystem), StringContent: system, Content: system}, + {Role: string(types.MessageRoleUser), StringContent: user.String(), Content: user.String()}, + } +} + +// clearInputAudio resets the session's pending input audio buffer (the raw +// PCM and any buffered Opus frames). Used by the input_audio_buffer.clear +// realtime event so a client can discard a partially-captured utterance. +func clearInputAudio(s *Session) { + s.AudioBufferLock.Lock() + s.InputAudioBuffer = nil + s.AudioBufferLock.Unlock() + s.OpusFramesLock.Lock() + s.OpusFrames = nil + s.OpusFramesLock.Unlock() +} + +// itemID extracts the id from any MessageItemUnion variant ("" if none). +func itemID(item *types.MessageItemUnion) string { + switch { + case item == nil: + return "" + case item.System != nil: + return item.System.ID + case item.User != nil: + return item.User.ID + case item.Assistant != nil: + return item.Assistant.ID + case item.FunctionCall != nil: + return item.FunctionCall.ID + case item.FunctionCallOutput != nil: + return item.FunctionCallOutput.ID + default: + return "" + } +} + +// deleteItem removes the item with id from items, returning the new slice and +// whether it was found. +func deleteItem(items []*types.MessageItemUnion, id string) ([]*types.MessageItemUnion, bool) { + for i, item := range items { + if itemID(item) == id { + return append(items[:i:i], items[i+1:]...), true + } + } + return items, false +} + +// truncateAssistantText clears the text of the assistant item's content part at +// contentIndex. Minimal truncate: used to discard an interrupted/barge-in +// response tail. Both .Text and .Transcript are cleared because realtime audio +// turns store the spoken words in .Transcript (clearing only .Text would no-op). +func truncateAssistantText(items []*types.MessageItemUnion, id string, contentIndex int) bool { + for _, item := range items { + if itemID(item) != id || item.Assistant == nil { + continue + } + if contentIndex >= 0 && contentIndex < len(item.Assistant.Content) { + item.Assistant.Content[contentIndex].Text = "" + item.Assistant.Content[contentIndex].Transcript = "" + } + return true + } + return false +} + +// compactionCut returns the index splitting items into overflow (items[:cut], +// to be summarized+evicted) and the kept live tail (items[cut:]), keeping the +// last `keep` items. It mirrors trimRealtimeItems' pair-safety: the cut is +// pulled left so a function_call and its function_call_output are never split +// across the boundary (the whole pair lands in the kept tail). Returns 0 when +// there is nothing to cut. +func compactionCut(items []*types.MessageItemUnion, keep int) int { + // keep <= 0 means no live-window cap (the "unlimited history" sentinel, as + // in trimRealtimeItems): there is nothing to evict, so cut nothing. This + // also avoids indexing items[len(items)] in the pair-safety loop below. + if keep <= 0 { + return 0 + } + cut := len(items) - keep + if cut <= 0 { + return 0 + } + for cut > 0 && items[cut] != nil && items[cut].FunctionCallOutput != nil { + cut-- + } + return cut +} + +// resolveCompaction reads the pipeline.compaction block, applying defaults and +// the trigger>max_history invariant. maxHistory is the already-resolved live +// window size. Returns enabled=false (and zero values) when compaction is off. +func resolveCompaction(cfg *config.ModelConfig, maxHistory int) (enabled bool, trigger, maxSummaryTokens int, summaryModel string) { + if cfg == nil || cfg.Pipeline.Compaction == nil || !cfg.Pipeline.Compaction.Enabled { + return false, 0, 0, "" + } + c := cfg.Pipeline.Compaction + trigger = c.TriggerItems + if trigger <= 0 { + trigger = maxHistory * 2 + } + if trigger <= maxHistory { + trigger = maxHistory + 1 + } + maxSummaryTokens = c.MaxSummaryTokens + if maxSummaryTokens <= 0 { + maxSummaryTokens = defaultMaxSummaryTokens + } + return true, trigger, maxSummaryTokens, c.SummaryModel +} + +// prefixMatches reports whether items begins with the same ids, in order, as +// snapshot — i.e. the overflow we summarized is still at the head (no concurrent +// client delete reshuffled it). +func prefixMatches(items, snapshot []*types.MessageItemUnion) bool { + if len(items) < len(snapshot) { + return false + } + for i := range snapshot { + if itemID(items[i]) != itemID(snapshot[i]) { + return false + } + } + return true +} + +// compact folds overflow items into conv.Memory and evicts them. It never holds +// conv.Lock across the summarizer call: snapshot under lock, summarize unlocked, +// commit under lock (re-validating the head is unchanged). On any error it +// leaves the conversation untouched — items are never dropped without a summary. +func (s *Session) compact(conv *Conversation, model Model) { + if model == nil { + return + } + // Snapshot. + conv.Lock.Lock() + if len(conv.Items) <= s.CompactionTrigger { + conv.Lock.Unlock() + return + } + cut := compactionCut(conv.Items, s.MaxHistoryItems) + if cut <= 0 { + conv.Lock.Unlock() + return + } + overflow := append([]*types.MessageItemUnion(nil), conv.Items[:cut]...) + prior := conv.Memory + conv.Lock.Unlock() + + // Summarize (unlocked). + msgs := buildSummaryMessages(prior, renderItemsTranscript(overflow), s.MaxSummaryTokens) + ctx, cancel := context.WithTimeout(context.Background(), compactionTimeout) + defer cancel() + predFunc, err := model.Predict(ctx, msgs, nil, nil, nil, nil, nil, nil, nil, nil, nil) + if err != nil { + xlog.Warn("realtime compaction: summarizer predict failed", "error", err) + return + } + pred, err := predFunc() + if err != nil { + xlog.Warn("realtime compaction: summarizer inference failed", "error", err) + return + } + // Strip any leaked reasoning/thinking spans using the same extractor the + // rest of the realtime path uses, rather than a bespoke regex. + rcfg := reasoning.Config{} + if mc := model.PredictConfig(); mc != nil { + rcfg = spokenReasoningConfig(mc.ReasoningConfig) + } + _, summary := reasoning.ExtractReasoningComplete(pred.Response, "", rcfg) + summary = strings.TrimSpace(summary) + if summary == "" { + xlog.Warn("realtime compaction: empty summary, skipping eviction") + return + } + + // Commit. + conv.Lock.Lock() + defer conv.Lock.Unlock() + if !prefixMatches(conv.Items, overflow) { + xlog.Debug("realtime compaction: head changed during summary, skipping") + return + } + conv.Memory = summary + conv.Items = conv.Items[len(overflow):] + xlog.Debug("realtime compaction: evicted items into memory", "evicted", len(overflow), "remaining", len(conv.Items)) +} + +// summarizerModel resolves the model used to produce compaction summaries. +// Without a configured summary_model (or factory) it reuses the pipeline LLM. +func (s *Session) summarizerModel() Model { + if s.SummaryModel == "" || s.summarizerFactory == nil { + return s.ModelInterface + } + s.summarizerOnce.Do(func() { + m, err := s.summarizerFactory() + if err != nil { + xlog.Warn("realtime compaction: summary_model load failed, falling back to pipeline LLM", "model", s.SummaryModel, "error", err) + m = s.ModelInterface + } + s.summarizerCached = m + }) + return s.summarizerCached +} + +// maybeCompact schedules a background compaction when the live buffer has grown +// past the trigger and none is already running. Returns immediately. +func (s *Session) maybeCompact(conv *Conversation) { + if !s.CompactionEnabled { + return + } + conv.Lock.Lock() + over := len(conv.Items) > s.CompactionTrigger + conv.Lock.Unlock() + if !over { + return + } + if !conv.compacting.CompareAndSwap(false, true) { + return + } + go func() { + defer conv.compacting.Store(false) + // Resolve (and, for a configured summary_model, lazily load) the + // summarizer only when a compaction actually runs, off the response + // path — so the model load never blocks a user turn. + model := s.summarizerModel() + if model == nil { + return + } + s.compact(conv, model) + }() +} diff --git a/core/http/endpoints/openai/realtime_compaction_test.go b/core/http/endpoints/openai/realtime_compaction_test.go new file mode 100644 index 000000000..5b19a8259 --- /dev/null +++ b/core/http/endpoints/openai/realtime_compaction_test.go @@ -0,0 +1,308 @@ +package openai + +import ( + "errors" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" + + "github.com/mudler/LocalAI/core/backend" + "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" + "github.com/mudler/LocalAI/core/schema" +) + +var _ = Describe("resolveCompaction", func() { + It("disables when the block is absent", func() { + enabled, _, _, _ := resolveCompaction(&config.ModelConfig{}, 6) + Expect(enabled).To(BeFalse()) + }) + + It("defaults trigger to 2x max history and tokens to 512", func() { + cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{Enabled: true}}} + enabled, trigger, maxTok, _ := resolveCompaction(cfg, 6) + Expect(enabled).To(BeTrue()) + Expect(trigger).To(Equal(12)) + Expect(maxTok).To(Equal(512)) + }) + + It("clamps trigger to max history + 1 when misconfigured", func() { + cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{Enabled: true, TriggerItems: 4}}} + _, trigger, _, _ := resolveCompaction(cfg, 6) + Expect(trigger).To(Equal(7)) + }) + + It("honors explicit values", func() { + cfg := &config.ModelConfig{Pipeline: config.Pipeline{Compaction: &config.PipelineCompaction{ + Enabled: true, TriggerItems: 20, MaxSummaryTokens: 256, SummaryModel: "tiny"}}} + enabled, trigger, maxTok, model := resolveCompaction(cfg, 6) + Expect(enabled).To(BeTrue()) + Expect(trigger).To(Equal(20)) + Expect(maxTok).To(Equal(256)) + Expect(model).To(Equal("tiny")) + }) +}) + +var _ = Describe("deleteItem", func() { + mk := func(ids ...string) []*types.MessageItemUnion { + out := make([]*types.MessageItemUnion, len(ids)) + for i, id := range ids { + out[i] = &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}} + } + return out + } + + It("removes the item with the given id", func() { + items, ok := deleteItem(mk("a", "b", "c"), "b") + Expect(ok).To(BeTrue()) + Expect(len(items)).To(Equal(2)) + Expect(itemID(items[0])).To(Equal("a")) + Expect(itemID(items[1])).To(Equal("c")) + }) + + It("reports not found for an unknown id", func() { + _, ok := deleteItem(mk("a"), "zzz") + Expect(ok).To(BeFalse()) + }) +}) + +var _ = Describe("clearInputAudio", func() { + It("resets the pending PCM and buffered Opus frames", func() { + s := &Session{InputAudioBuffer: []byte{1, 2, 3}, OpusFrames: [][]byte{{9}}} + clearInputAudio(s) + Expect(s.InputAudioBuffer).To(BeNil()) + Expect(s.OpusFrames).To(BeNil()) + }) +}) + +var _ = Describe("truncateAssistantText", func() { + It("clears the text of the assistant content part at the index", func() { + items := []*types.MessageItemUnion{{Assistant: &types.MessageItemAssistant{ + ID: "a1", + Content: []types.MessageContentOutput{{Type: types.MessageContentTypeText, Text: "hello world"}}, + }}} + ok := truncateAssistantText(items, "a1", 0) + Expect(ok).To(BeTrue()) + Expect(items[0].Assistant.Content[0].Text).To(Equal("")) + }) + + // Realtime assistant *audio* turns store the spoken words in .Transcript, not + // .Text, so a barge-in truncate must clear .Transcript too or it would no-op. + It("clears the transcript of an assistant audio content part", func() { + items := []*types.MessageItemUnion{{Assistant: &types.MessageItemAssistant{ + ID: "a1", + Content: []types.MessageContentOutput{{Type: types.MessageContentTypeAudio, Transcript: "hello world"}}, + }}} + ok := truncateAssistantText(items, "a1", 0) + Expect(ok).To(BeTrue()) + Expect(items[0].Assistant.Content[0].Transcript).To(Equal("")) + }) + + It("returns false for an unknown id", func() { + Expect(truncateAssistantText(nil, "nope", 0)).To(BeFalse()) + }) +}) + +var _ = Describe("compactionCut", func() { + user := func(id string) *types.MessageItemUnion { + return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}} + } + call := func(id string) *types.MessageItemUnion { + return &types.MessageItemUnion{FunctionCall: &types.MessageItemFunctionCall{ID: id}} + } + out := func(id string) *types.MessageItemUnion { + return &types.MessageItemUnion{FunctionCallOutput: &types.MessageItemFunctionCallOutput{ID: id}} + } + + It("cuts exactly len-keep when no pairs straddle the boundary", func() { + items := []*types.MessageItemUnion{user("1"), user("2"), user("3"), user("4")} + Expect(compactionCut(items, 2)).To(Equal(2)) + }) + + It("returns 0 when nothing to cut", func() { + Expect(compactionCut([]*types.MessageItemUnion{user("1")}, 2)).To(Equal(0)) + }) + + It("returns 0 (cuts nothing) when keep is 0 — the unlimited-window sentinel", func() { + items := []*types.MessageItemUnion{user("1"), user("2"), user("3")} + Expect(compactionCut(items, 0)).To(Equal(0)) + }) + + It("moves the boundary so a call/output pair is not split", func() { + // keep=2 -> naive cut=2, but items[2] is the output of items[1]'s call; + // pull the cut right so the whole pair stays in the kept tail. + items := []*types.MessageItemUnion{user("1"), call("c"), out("c"), user("4")} + Expect(compactionCut(items, 2)).To(Equal(1)) + }) +}) + +var _ = Describe("withMemory", func() { + It("inserts a memory system message when memory is non-empty", func() { + base := schema.Messages{{Role: "system", StringContent: "instructions"}} + out := withMemory(base, "user is Bob; wants pizza") + Expect(len(out)).To(Equal(2)) + Expect(out[1].Role).To(Equal("system")) + Expect(out[1].StringContent).To(ContainSubstring("user is Bob")) + Expect(out[1].StringContent).To(ContainSubstring("Summary of earlier conversation")) + }) + + It("is a no-op when memory is empty", func() { + base := schema.Messages{{Role: "system", StringContent: "instructions"}} + Expect(withMemory(base, "")).To(HaveLen(1)) + }) +}) + +var _ = Describe("renderItemsTranscript", func() { + It("renders user and assistant text turns", func() { + items := []*types.MessageItemUnion{ + {User: &types.MessageItemUser{Content: []types.MessageContentInput{{Type: types.MessageContentTypeInputText, Text: "hi"}}}}, + {Assistant: &types.MessageItemAssistant{Content: []types.MessageContentOutput{{Type: types.MessageContentTypeText, Text: "hello"}}}}, + } + out := renderItemsTranscript(items) + Expect(out).To(ContainSubstring("user: hi")) + Expect(out).To(ContainSubstring("assistant: hello")) + }) + + // Realtime assistant *audio* turns store the spoken words in .Transcript, not + // .Text, so the transcript builder must emit .Transcript too or spoken turns + // would be dropped from the summary. + It("renders an assistant audio turn from its transcript", func() { + items := []*types.MessageItemUnion{ + {Assistant: &types.MessageItemAssistant{Content: []types.MessageContentOutput{{Type: types.MessageContentTypeAudio, Transcript: "spoken words"}}}}, + } + Expect(renderItemsTranscript(items)).To(ContainSubstring("assistant: spoken words")) + }) +}) + +var _ = Describe("buildSummaryMessages", func() { + It("includes prior memory and the new transcript", func() { + msgs := buildSummaryMessages("prior facts", "user: hi", 512) + Expect(len(msgs)).To(Equal(2)) + Expect(msgs[0].Role).To(Equal("system")) + Expect(msgs[1].StringContent).To(ContainSubstring("prior facts")) + Expect(msgs[1].StringContent).To(ContainSubstring("user: hi")) + }) +}) + +var _ = Describe("compact", func() { + user := func(id, text string) *types.MessageItemUnion { + return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id, + Content: []types.MessageContentInput{{Type: types.MessageContentTypeInputText, Text: text}}}} + } + + It("summarizes overflow into Memory and evicts it, keeping the live tail", func() { + conv := &Conversation{Items: []*types.MessageItemUnion{ + user("1", "a"), user("2", "b"), user("3", "c"), user("4", "d"), + user("5", "e"), user("6", "f"), user("7", "g"), user("8", "h"), + }} + s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4, MaxSummaryTokens: 512} + m := &fakeModel{predictResp: backend.LLMResponse{Response: "ROLLED UP"}} + + s.compact(conv, m) + + Expect(conv.Memory).To(Equal("ROLLED UP")) + Expect(len(conv.Items)).To(Equal(4)) + Expect(itemID(conv.Items[0])).To(Equal("5")) + // The summarizer saw the evicted turns. + Expect(m.lastMessages[1].StringContent).To(ContainSubstring("a")) + }) + + It("leaves Items and Memory untouched when the summarizer errors", func() { + items := []*types.MessageItemUnion{user("1", "a"), user("2", "b"), user("3", "c")} + conv := &Conversation{Items: items} + s := &Session{CompactionEnabled: true, CompactionTrigger: 2, MaxHistoryItems: 1, MaxSummaryTokens: 512} + m := &fakeModel{predictErr: errors.New("boom")} + + s.compact(conv, m) + + Expect(conv.Memory).To(Equal("")) + Expect(len(conv.Items)).To(Equal(3)) + }) + + It("strips leaked reasoning tags from the summary via the shared extractor", func() { + conv := &Conversation{Items: []*types.MessageItemUnion{ + user("1", "a"), user("2", "b"), user("3", "c"), user("4", "d"), + user("5", "e"), user("6", "f"), user("7", "g"), user("8", "h"), + }} + s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4, MaxSummaryTokens: 512} + m := &fakeModel{predictResp: backend.LLMResponse{Response: "planning the summaryCLEAN SUMMARY"}} + + s.compact(conv, m) + + Expect(conv.Memory).To(Equal("CLEAN SUMMARY")) + Expect(conv.Memory).ToNot(ContainSubstring("planning")) + }) + + It("does nothing when items are at or below the trigger", func() { + conv := &Conversation{Items: []*types.MessageItemUnion{user("1", "a")}} + s := &Session{CompactionEnabled: true, CompactionTrigger: 7, MaxHistoryItems: 4} + s.compact(conv, &fakeModel{predictResp: backend.LLMResponse{Response: "x"}}) + Expect(conv.Memory).To(Equal("")) + Expect(len(conv.Items)).To(Equal(1)) + }) +}) + +var _ = Describe("prefixMatches", func() { + user := func(id string) *types.MessageItemUnion { + return &types.MessageItemUnion{User: &types.MessageItemUser{ID: id}} + } + + It("matches when items begins with the snapshot ids in order", func() { + items := []*types.MessageItemUnion{user("1"), user("2"), user("3")} + snap := []*types.MessageItemUnion{user("1"), user("2")} + Expect(prefixMatches(items, snap)).To(BeTrue()) + }) + + It("matches an empty snapshot", func() { + Expect(prefixMatches([]*types.MessageItemUnion{user("1")}, nil)).To(BeTrue()) + }) + + It("fails when items is shorter than the snapshot (a concurrent delete shrank the head)", func() { + items := []*types.MessageItemUnion{user("1")} + snap := []*types.MessageItemUnion{user("1"), user("2")} + Expect(prefixMatches(items, snap)).To(BeFalse()) + }) + + It("fails when the head ids differ (a concurrent delete reordered the head)", func() { + items := []*types.MessageItemUnion{user("2"), user("3")} + snap := []*types.MessageItemUnion{user("1"), user("2")} + Expect(prefixMatches(items, snap)).To(BeFalse()) + }) +}) + +var _ = Describe("summarizerModel", func() { + It("returns the pipeline model when no summary_model is set", func() { + m := &fakeModel{} + s := &Session{ModelInterface: m} + Expect(s.summarizerModel()).To(Equal(m)) + }) + + It("uses the factory (once) when summary_model is set", func() { + pipeline := &fakeModel{} + small := &fakeModel{} + calls := 0 + s := &Session{ModelInterface: pipeline, SummaryModel: "tiny", + summarizerFactory: func() (Model, error) { calls++; return small, nil }} + Expect(s.summarizerModel()).To(Equal(small)) + Expect(s.summarizerModel()).To(Equal(small)) + Expect(calls).To(Equal(1)) + }) + + It("falls back to the pipeline model when the factory errors", func() { + pipeline := &fakeModel{} + s := &Session{ModelInterface: pipeline, SummaryModel: "tiny", + summarizerFactory: func() (Model, error) { return nil, errors.New("nope") }} + Expect(s.summarizerModel()).To(Equal(pipeline)) + }) +}) + +var _ = Describe("itemID", func() { + It("returns the id for each variant and empty for nil", func() { + Expect(itemID(nil)).To(Equal("")) + Expect(itemID(&types.MessageItemUnion{User: &types.MessageItemUser{ID: "u1"}})).To(Equal("u1")) + Expect(itemID(&types.MessageItemUnion{Assistant: &types.MessageItemAssistant{ID: "a1"}})).To(Equal("a1")) + Expect(itemID(&types.MessageItemUnion{System: &types.MessageItemSystem{ID: "s1"}})).To(Equal("s1")) + Expect(itemID(&types.MessageItemUnion{FunctionCall: &types.MessageItemFunctionCall{ID: "f1"}})).To(Equal("f1")) + Expect(itemID(&types.MessageItemUnion{FunctionCallOutput: &types.MessageItemFunctionCallOutput{ID: "o1"}})).To(Equal("o1")) + }) +}) diff --git a/core/http/endpoints/openai/realtime_doubles_test.go b/core/http/endpoints/openai/realtime_doubles_test.go index accd6af51..10e608c17 100644 --- a/core/http/endpoints/openai/realtime_doubles_test.go +++ b/core/http/endpoints/openai/realtime_doubles_test.go @@ -75,6 +75,11 @@ type fakeModel struct { transcribeDeltas []string transcribeFinal *schema.TranscriptionResult + // soundDetectionResult/soundDetectionErr drive the SoundDetection double so + // the sound-event path can be exercised deterministically. + soundDetectionResult *schema.SoundClassificationResult + soundDetectionErr error + // Predict streaming: predictTokens are replayed through the token callback // (simulating streamed LLM output); predictResp/predictErr are returned by // the deferred predict function. predictChunkDeltas, when set, are delivered @@ -83,6 +88,8 @@ type fakeModel struct { predictChunkDeltas [][]*proto.ChatDelta predictResp backend.LLMResponse predictErr error + + lastMessages schema.Messages } func (m *fakeModel) VAD(context.Context, *schema.VADRequest) (*schema.VADResponse, error) { @@ -93,7 +100,15 @@ func (m *fakeModel) Transcribe(context.Context, string, string, bool, bool, stri return m.transcribeFinal, nil } -func (m *fakeModel) Predict(_ context.Context, _ schema.Messages, _, _, _ []string, cb func(string, backend.TokenUsage) bool, _ []types.ToolUnion, _ *types.ToolChoiceUnion, _, _ *int, _ map[string]float64) (func() (backend.LLMResponse, error), error) { +func (m *fakeModel) SoundDetection(context.Context, string, int, float32) (*schema.SoundClassificationResult, error) { + if m.soundDetectionErr != nil { + return nil, m.soundDetectionErr + } + return m.soundDetectionResult, nil +} + +func (m *fakeModel) Predict(_ context.Context, msgs schema.Messages, _, _, _ []string, cb func(string, backend.TokenUsage) bool, _ []types.ToolUnion, _ *types.ToolChoiceUnion, _, _ *int, _ map[string]float64) (func() (backend.LLMResponse, error), error) { + m.lastMessages = msgs if m.predictErr != nil { return nil, m.predictErr } diff --git a/core/http/endpoints/openai/realtime_model.go b/core/http/endpoints/openai/realtime_model.go index 789ce0a0d..0dafa0a35 100644 --- a/core/http/endpoints/openai/realtime_model.go +++ b/core/http/endpoints/openai/realtime_model.go @@ -31,10 +31,11 @@ var ( // This means that we will fake an Any-to-Any model by overriding some of the gRPC client methods // which are for Any-To-Any models, but instead we will call a pipeline (for e.g STT->LLM->TTS) type wrappedModel struct { - TTSConfig *config.ModelConfig - TranscriptionConfig *config.ModelConfig - LLMConfig *config.ModelConfig - VADConfig *config.ModelConfig + TTSConfig *config.ModelConfig + TranscriptionConfig *config.ModelConfig + LLMConfig *config.ModelConfig + VADConfig *config.ModelConfig + SoundDetectionConfig *config.ModelConfig appConfig *config.ApplicationConfig modelLoader *model.ModelLoader @@ -64,8 +65,9 @@ type anyToAnyModel struct { } type transcriptOnlyModel struct { - TranscriptionConfig *config.ModelConfig - VADConfig *config.ModelConfig + TranscriptionConfig *config.ModelConfig + VADConfig *config.ModelConfig + SoundDetectionConfig *config.ModelConfig appConfig *config.ApplicationConfig modelLoader *model.ModelLoader @@ -80,6 +82,10 @@ func (m *transcriptOnlyModel) Transcribe(ctx context.Context, audio, language st return backend.ModelTranscription(ctx, audio, language, translate, diarize, prompt, m.modelLoader, *m.TranscriptionConfig, m.appConfig) } +func (m *transcriptOnlyModel) SoundDetection(ctx context.Context, audio string, topK int, threshold float32) (*schema.SoundClassificationResult, error) { + return modelSoundDetection(ctx, m.modelLoader, m.appConfig, m.SoundDetectionConfig, audio, topK, threshold) +} + func (m *transcriptOnlyModel) Predict(ctx context.Context, messages schema.Messages, images, videos, audios []string, tokenCallback func(string, backend.TokenUsage) bool, tools []types.ToolUnion, toolChoice *types.ToolChoiceUnion, logprobs *int, topLogprobs *int, logitBias map[string]float64) (func() (backend.LLMResponse, error), error) { return nil, fmt.Errorf("predict operation not supported in transcript-only mode") } @@ -108,6 +114,10 @@ func (m *wrappedModel) Transcribe(ctx context.Context, audio, language string, t return backend.ModelTranscription(ctx, audio, language, translate, diarize, prompt, m.modelLoader, *m.TranscriptionConfig, m.appConfig) } +func (m *wrappedModel) SoundDetection(ctx context.Context, audio string, topK int, threshold float32) (*schema.SoundClassificationResult, error) { + return modelSoundDetection(ctx, m.modelLoader, m.appConfig, m.SoundDetectionConfig, audio, topK, threshold) +} + func (m *wrappedModel) Predict(ctx context.Context, messages schema.Messages, images, videos, audios []string, tokenCallback func(string, backend.TokenUsage) bool, tools []types.ToolUnion, toolChoice *types.ToolChoiceUnion, logprobs *int, topLogprobs *int, logitBias map[string]float64) (func() (backend.LLMResponse, error), error) { input := schema.OpenAIRequest{ Messages: messages, @@ -399,8 +409,41 @@ func transcribeStream(ctx context.Context, ml *model.ModelLoader, transcriptionC return final, nil } +// modelSoundDetection runs sound-event classification against the session's +// sound-classification model config, mirroring how Transcribe dispatches to +// the transcription backend. Returns an error when no sound-detection model is +// configured for the session. +func modelSoundDetection(ctx context.Context, ml *model.ModelLoader, appConfig *config.ApplicationConfig, soundConfig *config.ModelConfig, audio string, topK int, threshold float32) (*schema.SoundClassificationResult, error) { + if soundConfig == nil { + return nil, fmt.Errorf("sound detection is not configured for this session") + } + return backend.ModelSoundDetection(ctx, backend.SoundDetectionRequest{ + Audio: audio, + TopK: int32(topK), + Threshold: threshold, + }, ml, *soundConfig, appConfig) +} + +// loadSoundDetectionConfig resolves the optional sound-classification model +// config named by pipeline.sound_detection. Returns (nil, nil) when no model +// is configured so sound detection stays additive and never blocks session +// setup. +func loadSoundDetectionConfig(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader) (*config.ModelConfig, error) { + if pipeline.SoundDetection == "" { + return nil, nil + } + cfg, err := loadPipelineSubModel(cl, pipeline.SoundDetection, ml.ModelPath) + if err != nil { + return nil, fmt.Errorf("failed to load sound detection config: %w", err) + } + if valid, _ := cfg.Validate(); !valid { + return nil, fmt.Errorf("failed to validate sound detection config %q", pipeline.SoundDetection) + } + return cfg, nil +} + func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) (Model, *config.ModelConfig, error) { - cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath) + cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath) if err != nil { return nil, nil, fmt.Errorf("failed to load backend config: %w", err) @@ -410,7 +453,7 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig return nil, nil, fmt.Errorf("failed to validate config: %w", err) } - cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath) + cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath) if err != nil { return nil, nil, fmt.Errorf("failed to load backend config: %w", err) @@ -420,9 +463,15 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig return nil, nil, fmt.Errorf("failed to validate config: %w", err) } + cfgSound, err := loadSoundDetectionConfig(pipeline, cl, ml) + if err != nil { + return nil, nil, err + } + return &transcriptOnlyModel{ - TranscriptionConfig: cfgSST, - VADConfig: cfgVAD, + TranscriptionConfig: cfgSST, + VADConfig: cfgVAD, + SoundDetectionConfig: cfgSound, confLoader: cl, modelLoader: ml, @@ -430,6 +479,27 @@ func newTranscriptionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfig }, cfgSST, nil } +// newSoundDetectionOnlyModel builds a realtime model that only does sound-event +// classification: no VAD, transcription, LLM or TTS stages are loaded. Used for +// a sound-detection-only realtime session, which activates on sounds (not +// speech) and is driven by client-side windowing (turn_detection none + +// input_audio_buffer.commit) rather than the voice VAD loop. +func newSoundDetectionOnlyModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) (Model, error) { + cfgSound, err := loadSoundDetectionConfig(pipeline, cl, ml) + if err != nil { + return nil, err + } + if cfgSound == nil { + return nil, fmt.Errorf("a sound-only realtime session requires pipeline.sound_detection") + } + return &transcriptOnlyModel{ + SoundDetectionConfig: cfgSound, + confLoader: cl, + modelLoader: ml, + appConfig: appConfig, + }, nil +} + // RealtimeRoutingContext is the bundle of routing dependencies the // realtime pipeline needs to consult router.Resolve per turn. nil-safe: // passing nil skips routing entirely and preserves the historical "one @@ -472,11 +542,30 @@ func buildRealtimeRoutingContext(a *application.Application, sessionID string) * } } +// loadPipelineSubModel loads a pipeline sub-model config by name and follows a +// single alias hop, so a pipeline that references an alias (e.g. `llm: default`) +// gets the alias target's full config (Backend, Model, ...) rather than the +// alias stub with an empty Backend. Without this the alias survives unresolved +// into model loading and fails downstream — notably in distributed mode with +// "backend name is empty". Mirrors the top-level alias resolution in +// core/http/middleware/request.go. +func loadPipelineSubModel(cl *config.ModelConfigLoader, name, modelPath string) (*config.ModelConfig, error) { + cfg, err := cl.LoadModelConfigFileByName(name, modelPath) + if err != nil { + return nil, err + } + resolved, _, err := cl.ResolveAlias(cfg) + if err != nil { + return nil, err + } + return resolved, nil +} + // returns and loads either a wrapped model or a model that support audio-to-audio func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig, evaluator *templates.Evaluator, routing *RealtimeRoutingContext) (Model, error) { xlog.Debug("Creating new model pipeline model", "pipeline", pipeline) - cfgVAD, err := cl.LoadModelConfigFileByName(pipeline.VAD, ml.ModelPath) + cfgVAD, err := loadPipelineSubModel(cl, pipeline.VAD, ml.ModelPath) if err != nil { return nil, fmt.Errorf("failed to load backend config: %w", err) @@ -487,7 +576,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model } // TODO: Do we always need a transcription model? It can be disabled. Note that any-to-any instruction following models don't transcribe as such, so if transcription is required it is a separate process - cfgSST, err := cl.LoadModelConfigFileByName(pipeline.Transcription, ml.ModelPath) + cfgSST, err := loadPipelineSubModel(cl, pipeline.Transcription, ml.ModelPath) if err != nil { return nil, fmt.Errorf("failed to load backend config: %w", err) @@ -519,7 +608,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model xlog.Debug("Loading a wrapped model") // Otherwise we want to return a wrapped model, which is a "virtual" model that re-uses other models to perform operations - cfgLLM, err := cl.LoadModelConfigFileByName(pipeline.LLM, ml.ModelPath) + cfgLLM, err := loadPipelineSubModel(cl, pipeline.LLM, ml.ModelPath) if err != nil { return nil, fmt.Errorf("failed to load backend config: %w", err) @@ -534,7 +623,7 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model applyPipelineReasoning(cfgLLM, *pipeline) applyPipelineThinking(cfgLLM, *pipeline) - cfgTTS, err := cl.LoadModelConfigFileByName(pipeline.TTS, ml.ModelPath) + cfgTTS, err := loadPipelineSubModel(cl, pipeline.TTS, ml.ModelPath) if err != nil { return nil, fmt.Errorf("failed to load backend config: %w", err) @@ -544,11 +633,17 @@ func newModel(pipeline *config.Pipeline, cl *config.ModelConfigLoader, ml *model return nil, fmt.Errorf("failed to validate config: %w", err) } + cfgSound, err := loadSoundDetectionConfig(pipeline, cl, ml) + if err != nil { + return nil, err + } + wm := &wrappedModel{ - TTSConfig: cfgTTS, - TranscriptionConfig: cfgSST, - LLMConfig: cfgLLM, - VADConfig: cfgVAD, + TTSConfig: cfgTTS, + TranscriptionConfig: cfgSST, + LLMConfig: cfgLLM, + VADConfig: cfgVAD, + SoundDetectionConfig: cfgSound, confLoader: cl, modelLoader: ml, diff --git a/core/http/endpoints/openai/realtime_model_alias_test.go b/core/http/endpoints/openai/realtime_model_alias_test.go new file mode 100644 index 000000000..77179d963 --- /dev/null +++ b/core/http/endpoints/openai/realtime_model_alias_test.go @@ -0,0 +1,52 @@ +package openai + +import ( + "os" + "path/filepath" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" + + "github.com/mudler/LocalAI/core/config" +) + +// loadPipelineSubModel must resolve a pipeline sub-model that references an +// alias (e.g. `llm: default`) one hop to the alias target's full config — so +// the effective backend is the target's backend, not the empty backend of the +// alias stub. This mirrors the top-level alias resolution done in +// core/http/middleware/request.go, which the realtime pipeline previously +// skipped (failing in distributed mode with "backend name is empty"). +var _ = Describe("loadPipelineSubModel", func() { + It("resolves a sub-model alias one hop to the target's config", func() { + tmpDir := GinkgoT().TempDir() + + // A real model config with a concrete backend. + realLLM := `name: real-llm +backend: llama-cpp +parameters: + model: real-llm.gguf +` + Expect(os.WriteFile(filepath.Join(tmpDir, "real-llm.yaml"), []byte(realLLM), 0644)).To(Succeed()) + + // An alias pointing at the real model. + aliasCfg := `name: default +alias: real-llm +` + Expect(os.WriteFile(filepath.Join(tmpDir, "default.yaml"), []byte(aliasCfg), 0644)).To(Succeed()) + + cl := config.NewModelConfigLoader(tmpDir) + Expect(cl.LoadModelConfigsFromPath(tmpDir)).To(Succeed()) + + // Resolving the alias must follow the hop to the target's full config. + resolved, err := loadPipelineSubModel(cl, "default", tmpDir) + Expect(err).NotTo(HaveOccurred()) + Expect(resolved.IsAlias()).To(BeFalse()) + Expect(resolved.Backend).To(Equal("llama-cpp")) + + // A non-alias name must load unchanged. + direct, err := loadPipelineSubModel(cl, "real-llm", tmpDir) + Expect(err).NotTo(HaveOccurred()) + Expect(direct.Backend).To(Equal("llama-cpp")) + Expect(direct.Name).To(Equal("real-llm")) + }) +}) diff --git a/core/http/endpoints/openai/realtime_sound_detection.go b/core/http/endpoints/openai/realtime_sound_detection.go new file mode 100644 index 000000000..6bc4efb47 --- /dev/null +++ b/core/http/endpoints/openai/realtime_sound_detection.go @@ -0,0 +1,48 @@ +package openai + +import ( + "context" + + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" +) + +// defaultSoundDetectionTopK is the number of scored tags requested per +// committed utterance when the session does not pin its own top_k. +const defaultSoundDetectionTopK = 5 + +// emitSoundDetection classifies a committed utterance into sound-event tags and +// emits a conversation.item.sound_detection event for it. It mirrors +// emitTranscription's unary path: it calls the session's sound-event +// classifier, maps the scored tags onto the server event, and sends it over +// the transport. Sound detection is additive to transcription: its result is +// emitted independently and a failure here is the caller's to log, never a +// reason to abort the turn. +func emitSoundDetection(ctx context.Context, t Transport, session *Session, itemID, audioPath string) error { + topK := session.SoundDetectionTopK + if topK <= 0 { + topK = defaultSoundDetectionTopK + } + + result, err := session.ModelInterface.SoundDetection(ctx, audioPath, topK, session.SoundDetectionThreshold) + if err != nil { + return err + } + + detections := make([]types.SoundDetectionTag, 0) + if result != nil { + for _, d := range result.Detections { + detections = append(detections, types.SoundDetectionTag{ + Label: d.Label, + Score: d.Score, + Index: d.Index, + }) + } + } + + return t.SendEvent(types.ConversationItemSoundDetectionEvent{ + ServerEventBase: types.ServerEventBase{EventID: "event_TODO"}, + ItemID: itemID, + ContentIndex: 0, + Detections: detections, + }) +} diff --git a/core/http/endpoints/openai/realtime_sound_detection_test.go b/core/http/endpoints/openai/realtime_sound_detection_test.go new file mode 100644 index 000000000..e440e80c3 --- /dev/null +++ b/core/http/endpoints/openai/realtime_sound_detection_test.go @@ -0,0 +1,170 @@ +package openai + +import ( + "context" + "encoding/binary" + "errors" + "os" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" + + "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" + "github.com/mudler/LocalAI/core/schema" +) + +// emitSoundDetection classifies a committed utterance and emits a single +// conversation.item.sound_detection event carrying the scored AudioSet tags. +var _ = Describe("emitSoundDetection", func() { + It("emits a sound_detection event with the classifier's scored tags", func() { + session := &Session{ + SoundDetectionEnabled: true, + SoundDetectionTopK: 5, + ModelInterface: &fakeModel{ + soundDetectionResult: &schema.SoundClassificationResult{ + Model: "ced", + Detections: []schema.SoundClassification{ + {Index: 3, Label: "Baby cry, infant cry", Score: 0.91}, + {Index: 7, Label: "Speech", Score: 0.42}, + }, + }, + }, + } + t := &fakeTransport{} + + err := emitSoundDetection(context.Background(), t, session, "item1", "/tmp/x.wav") + + Expect(err).ToNot(HaveOccurred()) + Expect(t.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(1)) + + ev, ok := t.events[0].(types.ConversationItemSoundDetectionEvent) + Expect(ok).To(BeTrue()) + Expect(ev.ItemID).To(Equal("item1")) + Expect(ev.ContentIndex).To(Equal(0)) + Expect(ev.Detections).To(HaveLen(2)) + Expect(ev.Detections[0].Label).To(Equal("Baby cry, infant cry")) + Expect(ev.Detections[0].Score).To(BeNumerically("~", 0.91, 1e-6)) + Expect(ev.Detections[0].Index).To(Equal(3)) + Expect(ev.Detections[1].Label).To(Equal("Speech")) + }) + + It("emits an event with no detections when the classifier returns none", func() { + session := &Session{ + SoundDetectionEnabled: true, + ModelInterface: &fakeModel{ + soundDetectionResult: &schema.SoundClassificationResult{Model: "ced"}, + }, + } + t := &fakeTransport{} + + err := emitSoundDetection(context.Background(), t, session, "item1", "/tmp/x.wav") + + Expect(err).ToNot(HaveOccurred()) + Expect(t.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(1)) + ev, ok := t.events[0].(types.ConversationItemSoundDetectionEvent) + Expect(ok).To(BeTrue()) + Expect(ev.Detections).To(BeEmpty()) + }) + + It("propagates the classifier error and emits no event", func() { + session := &Session{ + SoundDetectionEnabled: true, + ModelInterface: &fakeModel{soundDetectionErr: errors.New("boom")}, + } + t := &fakeTransport{} + + err := emitSoundDetection(context.Background(), t, session, "item1", "/tmp/x.wav") + + Expect(err).To(HaveOccurred()) + Expect(t.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(0)) + }) +}) + +// A sound-detection-only session (no transcription, no LLM) runs through +// commitUtterance WITHOUT the voice/transcription path: it emits the +// sound_detection event and stops - no transcription event, no LLM response. +var _ = Describe("commitUtterance (sound-detection-only session)", func() { + It("emits sound detection and neither transcribes nor generates a response", func() { + session := &Session{ + SoundDetectionEnabled: true, + SoundDetectionTopK: 5, + InputAudioTranscription: nil, // sound-only: no transcription stage + ModelConfig: &config.ModelConfig{}, + ModelInterface: &fakeModel{ + soundDetectionResult: &schema.SoundClassificationResult{ + Model: "ced", + Detections: []schema.SoundClassification{ + {Index: 23, Label: "Baby cry, infant cry", Score: 0.87}, + }, + }, + }, + } + tr := &fakeTransport{} + utt := make([]byte, 32) // non-empty PCM so commitUtterance proceeds + + commitUtterance(context.Background(), utt, session, &Conversation{}, tr) + + Expect(tr.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(1)) + // No transcription happened. + Expect(tr.countEvents(types.ServerEventTypeConversationItemInputAudioTranscriptionCompleted)).To(Equal(0)) + // No LLM response was generated (sound-only has no LLM stage). + Expect(tr.countEvents(types.ServerEventTypeResponseDone)).To(Equal(0)) + }) +}) + +// Server-side windowing (option B): a sound-only session classifies the last +// WindowMs of streamed audio per tick, with no client commit, and keeps the +// input buffer trimmed to one window. +var _ = Describe("classifySoundWindow (server-side windowing)", func() { + newSoundSession := func() (*Session, *fakeTransport) { + return &Session{ + SoundDetectionEnabled: true, + SoundDetectionTopK: 5, + SoundDetectionWindowMs: 200, // 200ms @ 16kHz mono16 = 6400 bytes + SoundDetectionHopMs: 20, + InputSampleRate: 16000, + ModelInterface: &fakeModel{ + soundDetectionResult: &schema.SoundClassificationResult{ + Model: "ced", + Detections: []schema.SoundClassification{{Index: 23, Label: "Baby cry, infant cry", Score: 0.87}}, + }, + }, + }, &fakeTransport{} + } + + It("emits a sound_detection event and trims the buffer to one window", func() { + session, tr := newSoundSession() + session.InputAudioBuffer = make([]byte, 10000) // > 6400-byte window + + classifySoundWindow(session, tr) + + Expect(tr.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(1)) + // buffer trimmed to exactly one window (200ms @ 16kHz mono 16-bit) + Expect(len(session.InputAudioBuffer)).To(Equal(6400)) + }) + + It("does nothing when too little audio is buffered", func() { + session, tr := newSoundSession() + session.InputAudioBuffer = make([]byte, 100) // < ~10ms (320 bytes) + + classifySoundWindow(session, tr) + + Expect(tr.countEvents(types.ServerEventTypeConversationItemSoundDetection)).To(Equal(0)) + }) +}) + +var _ = Describe("writeWindowWAV", func() { + It("writes a mono 16-bit WAV header declaring the given sample rate", func() { + pcm := make([]byte, 640) + path, err := writeWindowWAV(pcm, 24000) + Expect(err).ToNot(HaveOccurred()) + defer func() { _ = os.Remove(path) }() + + data, err := os.ReadFile(path) + Expect(err).ToNot(HaveOccurred()) + Expect(len(data)).To(BeNumerically(">=", 44+len(pcm))) + // SampleRate is a little-endian uint32 at byte offset 24 of a WAV header. + Expect(binary.LittleEndian.Uint32(data[24:28])).To(Equal(uint32(24000))) + }) +}) diff --git a/core/http/endpoints/openai/realtime_speaker_event_test.go b/core/http/endpoints/openai/realtime_speaker_event_test.go new file mode 100644 index 000000000..fbbe5ded9 --- /dev/null +++ b/core/http/endpoints/openai/realtime_speaker_event_test.go @@ -0,0 +1,54 @@ +package openai + +import ( + "encoding/json" + + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("ConversationItemSpeakerEvent", func() { + It("marshals with the conversation.item.speaker type and nested speaker", func() { + ev := types.ConversationItemSpeakerEvent{ + ItemID: "item_123", + Speaker: types.Speaker{Name: "Jeremy", ID: "spk_1", Labels: map[string]string{"family": "yes"}, Confidence: 92, Distance: 0.1, Matched: true}, + } + b, err := json.Marshal(ev) + Expect(err).ToNot(HaveOccurred()) + + var got map[string]any + Expect(json.Unmarshal(b, &got)).To(Succeed()) + Expect(got["type"]).To(Equal("conversation.item.speaker")) + Expect(got["item_id"]).To(Equal("item_123")) + + spk := got["speaker"].(map[string]any) + Expect(spk["name"]).To(Equal("Jeremy")) + Expect(spk["id"]).To(Equal("spk_1")) + Expect(spk["matched"]).To(Equal(true)) + Expect(spk["labels"]).To(HaveKeyWithValue("family", "yes")) + }) + + It("omits labels when the speaker has none", func() { + ev := types.ConversationItemSpeakerEvent{ItemID: "i", Speaker: types.Speaker{Name: "Jeremy", Matched: true}} + b, err := json.Marshal(ev) + Expect(err).ToNot(HaveOccurred()) + var got map[string]any + Expect(json.Unmarshal(b, &got)).To(Succeed()) + spk := got["speaker"].(map[string]any) + _, hasLabels := spk["labels"] + Expect(hasLabels).To(BeFalse()) + }) + + It("omits the name for an unknown speaker but keeps matched=false", func() { + ev := types.ConversationItemSpeakerEvent{ItemID: "i", Speaker: types.Speaker{Matched: false}} + b, err := json.Marshal(ev) + Expect(err).ToNot(HaveOccurred()) + var got map[string]any + Expect(json.Unmarshal(b, &got)).To(Succeed()) + spk := got["speaker"].(map[string]any) + _, hasName := spk["name"] + Expect(hasName).To(BeFalse()) + Expect(spk["matched"]).To(Equal(false)) + }) +}) diff --git a/core/http/endpoints/openai/realtime_transport_webrtc.go b/core/http/endpoints/openai/realtime_transport_webrtc.go index b687654bd..9ddec5edb 100644 --- a/core/http/endpoints/openai/realtime_transport_webrtc.go +++ b/core/http/endpoints/openai/realtime_transport_webrtc.go @@ -113,8 +113,13 @@ func (t *WebRTCTransport) sendLoop() { return } if err := t.dc.SendText(string(data)); err != nil { - xlog.Error("data channel send failed", "error", err) - return + // Drop just this event and keep the loop alive: a single + // failed send (e.g. an event over the negotiated SCTP + // max-message-size) must not tear down the session and + // silently drop every subsequent event. A genuinely dead + // transport is handled by the <-t.closed case. + xlog.Error("data channel send failed, dropping event", "error", err) + continue } case <-t.closed: // Drain any remaining queued events before exiting @@ -122,7 +127,8 @@ func (t *WebRTCTransport) sendLoop() { select { case data := <-t.outEvents: if err := t.dc.SendText(string(data)); err != nil { - return + xlog.Error("data channel send failed while draining, dropping event", "error", err) + continue } default: return diff --git a/core/http/endpoints/openai/realtime_voicegate.go b/core/http/endpoints/openai/realtime_voicegate.go index 54332536f..9bd6f10f2 100644 --- a/core/http/endpoints/openai/realtime_voicegate.go +++ b/core/http/endpoints/openai/realtime_voicegate.go @@ -7,6 +7,7 @@ import ( "github.com/mudler/LocalAI/core/backend" "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/http/endpoints/openai/types" "github.com/mudler/LocalAI/core/services/voicerecognition" "github.com/mudler/LocalAI/pkg/model" ) @@ -29,6 +30,32 @@ type voiceGate struct { verifyFn func(ctx context.Context, uttWav, refWav string) (bool, error) } +// resolution is the outcome of resolving a committed utterance's speaker. It +// carries the surfacing-facing Speaker plus the metadata the policy layer needs +// (labels for the allow-list) and a human reason when no usable identity exists. +type resolution struct { + speaker types.Speaker // name/id/confidence/distance/matched + labels map[string]string // identify-mode metadata labels, for the allow-list + found bool // a candidate identity existed at all + reason string // why-unknown / deny reason at the resolve level +} + +// confidence maps a cosine distance to a 0..100 score relative to the match +// threshold, mirroring the /v1/voice/identify endpoint. +func confidence(distance, threshold float32) float32 { + if threshold <= 0 { + return 0 + } + c := (1 - distance/threshold) * 100 + if c < 0 { + return 0 + } + if c > 100 { + return 100 + } + return c +} + // newVoiceGate builds a gate from a pipeline's voice_recognition config. It // validates fail-fast (before loading the model), loads the recognition model // config, wires the real backend seams, and pre-embeds references for verify @@ -89,91 +116,143 @@ func newVoiceGate( return g, nil } -// Authorize embeds the utterance and decides allow/deny. -// -// allowed: speaker is authorized. -// matched: matched person's name (informational), empty if none. -// reason: human-readable deny reason. -// err: backend failure (caller should fail closed). -func (g *voiceGate) Authorize(ctx context.Context, wavPath string) (allowed bool, matched string, reason string, err error) { +// Resolve embeds the utterance once and resolves the speaker's identity. It does +// NOT apply the authorization policy (see authorize). On a backend error it +// returns the error and a resolution whose reason explains the failure. +func (g *voiceGate) Resolve(ctx context.Context, wavPath string) (resolution, error) { if g.cfg.Mode == config.VoiceGateModeVerify { - return g.authorizeVerify(ctx, wavPath) + return g.resolveVerify(ctx, wavPath) } - return g.authorizeIdentify(ctx, wavPath) + return g.resolveIdentify(ctx, wavPath) } -func (g *voiceGate) authorizeIdentify(ctx context.Context, wavPath string) (bool, string, string, error) { +func (g *voiceGate) resolveIdentify(ctx context.Context, wavPath string) (resolution, error) { emb, err := g.embedFn(ctx, wavPath) if err != nil { - return false, "", "embed failed", err + return resolution{reason: "embed failed"}, err } if len(emb) == 0 { - return false, "", "no speech detected", nil + return resolution{reason: "no speech detected"}, nil } matches, err := g.registry.Identify(ctx, emb, 1) if err != nil { - return false, "", "identify failed", err + return resolution{reason: "identify failed"}, err } if len(matches) == 0 { - return false, "", "unknown speaker", nil + return resolution{reason: "unknown speaker"}, nil } m := matches[0] - if m.Distance > g.cfg.Threshold { - return false, m.Metadata.Name, "distance above threshold", nil + matched := m.Distance <= g.cfg.Threshold + r := resolution{ + speaker: types.Speaker{ + Name: m.Metadata.Name, + ID: m.Metadata.ID, + Labels: m.Metadata.Labels, + Distance: m.Distance, + Confidence: confidence(m.Distance, g.cfg.Threshold), + Matched: matched, + }, + labels: m.Metadata.Labels, + found: true, } - if !g.allowMatch(m.Metadata) { - return false, m.Metadata.Name, "speaker not in allow list", nil + if !matched { + r.reason = "distance above threshold" } - return true, m.Metadata.Name, "", nil + return r, nil +} + +func (g *voiceGate) resolveVerify(ctx context.Context, wavPath string) (resolution, error) { + if g.cfg.AntiSpoofing { + for _, ref := range g.refAudios { + ok, err := g.verifyFn(ctx, wavPath, ref.Audio) + if err != nil { + return resolution{reason: "verify failed"}, err + } + if ok { + return resolution{ + speaker: types.Speaker{Name: ref.Name, Confidence: 100, Matched: true}, + found: true, + }, nil + } + } + return resolution{reason: "no reference matched"}, nil + } + + emb, err := g.embedFn(ctx, wavPath) + if err != nil { + return resolution{reason: "embed failed"}, err + } + if len(emb) == 0 { + return resolution{reason: "no speech detected"}, nil + } + for _, ref := range g.refEmbeds { + d := cosineDistance(emb, ref.emb) + if d <= g.cfg.Threshold { + return resolution{ + speaker: types.Speaker{Name: ref.name, Distance: d, Confidence: confidence(d, g.cfg.Threshold), Matched: true}, + found: true, + }, nil + } + } + return resolution{reason: "no reference matched"}, nil +} + +// authorize applies the gate's policy to an already-resolved identity. +func (g *voiceGate) authorize(r resolution) (allowed bool, reason string) { + if g.cfg.Mode == config.VoiceGateModeVerify { + if r.speaker.Matched { + return true, "" + } + if r.reason == "" { + return false, "no reference matched" + } + return false, r.reason + } + if !r.found { + return false, r.reason + } + if !r.speaker.Matched { + return false, "distance above threshold" + } + if !g.allowMatch(r.speaker.Name, r.labels) { + return false, "speaker not in allow list" + } + return true, "" } // allowMatch reports whether a matched identity is authorized. An empty allow // (no names and no labels) authorizes any registered speaker. -func (g *voiceGate) allowMatch(meta voicerecognition.Metadata) bool { +func (g *voiceGate) allowMatch(name string, labels map[string]string) bool { a := g.cfg.Allow if len(a.Names) == 0 && len(a.Labels) == 0 { return true } for _, n := range a.Names { - if n == meta.Name { + if n == name { return true } } for _, l := range a.Labels { - if _, ok := meta.Labels[l]; ok { + if _, ok := labels[l]; ok { return true } } return false } -func (g *voiceGate) authorizeVerify(ctx context.Context, wavPath string) (bool, string, string, error) { - if g.cfg.AntiSpoofing { - for _, r := range g.refAudios { - ok, err := g.verifyFn(ctx, wavPath, r.Audio) - if err != nil { - return false, "", "verify failed", err - } - if ok { - return true, r.Name, "", nil - } - } - return false, "", "no reference matched", nil +// Authorize is the legacy convenience wrapper: resolve then apply policy. +// +// allowed: speaker is authorized. +// matched: matched person's name (informational), empty if none. +// reason: human-readable deny reason. +// err: backend failure (caller should fail closed). +func (g *voiceGate) Authorize(ctx context.Context, wavPath string) (allowed bool, matched string, reason string, err error) { + r, rerr := g.Resolve(ctx, wavPath) + if rerr != nil { + return false, "", r.reason, rerr } - - emb, err := g.embedFn(ctx, wavPath) - if err != nil { - return false, "", "embed failed", err - } - if len(emb) == 0 { - return false, "", "no speech detected", nil - } - for _, r := range g.refEmbeds { - if cosineDistance(emb, r.emb) <= g.cfg.Threshold { - return true, r.name, "", nil - } - } - return false, "", "no reference matched", nil + allowed, reason = g.authorize(r) + return allowed, r.speaker.Name, reason, nil } // decide interprets an Authorize result against the gate's when-policy and the diff --git a/core/http/endpoints/openai/realtime_voicegate_integration_test.go b/core/http/endpoints/openai/realtime_voicegate_integration_test.go index f8aae72c5..b0f7f0b49 100644 --- a/core/http/endpoints/openai/realtime_voicegate_integration_test.go +++ b/core/http/endpoints/openai/realtime_voicegate_integration_test.go @@ -152,3 +152,252 @@ var _ = Describe("realtime voice gate integration (commitUtterance)", func() { Expect(tr2.countEvents(types.ServerEventTypeResponseDone)).To(BeNumerically(">=", 1)) }) }) + +var _ = Describe("realtime speaker surfacing (commitUtterance)", func() { + utt := make([]byte, 32) + + It("emits conversation.item.speaker for a confident match when announce is on", func() { + session, _ := itSession(itGate("alice", "alice", []float32{1, 0, 0}, nil, + config.VoiceGateWhenEvery, config.VoiceGateRejectEvent)) + session.voiceGate.cfg.Identity = &config.VoiceIdentityConfig{Announce: true} + tr := &fakeTransport{} + + commitUtterance(context.Background(), utt, session, &Conversation{}, tr) + + Expect(tr.countEvents(types.ServerEventTypeConversationItemSpeaker)).To(Equal(1)) + }) + + It("does not emit the speaker event for an unknown speaker unless announce_unknown is set", func() { + // match distance above threshold => not matched + gate := &voiceGate{ + cfg: config.PipelineVoiceRecognition{ + Mode: config.VoiceGateModeIdentify, Threshold: 0.25, + When: config.VoiceGateWhenEvery, OnReject: config.VoiceGateRejectEvent, + Enforce: boolPtr(false), + Identity: &config.VoiceIdentityConfig{Announce: true}, + }, + registry: &fakeRegistry{matches: []voicerecognition.Match{ + {Distance: 0.9, Metadata: voicerecognition.Metadata{Name: "alice"}}, + }}, + embedFn: func(context.Context, string) ([]float32, error) { return []float32{1, 0, 0}, nil }, + } + session, _ := itSession(gate) + tr := &fakeTransport{} + + commitUtterance(context.Background(), utt, session, &Conversation{}, tr) + Expect(tr.countEvents(types.ServerEventTypeConversationItemSpeaker)).To(Equal(0)) + + gate.cfg.Identity.AnnounceUnknown = true + tr2 := &fakeTransport{} + commitUtterance(context.Background(), utt, session, &Conversation{}, tr2) + Expect(tr2.countEvents(types.ServerEventTypeConversationItemSpeaker)).To(Equal(1)) + }) + + It("never drops a turn when enforce is false even for a disallowed speaker", func() { + session, _ := itSession(itGate("bob", "alice", []float32{1, 0, 0}, nil, + config.VoiceGateWhenEvery, config.VoiceGateRejectEvent)) + session.voiceGate.cfg.Enforce = boolPtr(false) + tr := &fakeTransport{} + + commitUtterance(context.Background(), utt, session, &Conversation{}, tr) + + Expect(hasSpeakerNotAuthorized(tr)).To(BeFalse()) + Expect(tr.countEvents(types.ServerEventTypeResponseDone)).To(BeNumerically(">=", 1)) + }) +}) + +var _ = Describe("realtime speaker personalization (triggerResponseAtTurn)", func() { + utt := make([]byte, 32) + + findRole := func(msgs schema.Messages, role string) *schema.Message { + for i := range msgs { + if msgs[i].Role == role { + return &msgs[i] + } + } + return nil + } + + It("sets the user message name and a current-speaker system note", func() { + session, m := itSession(itGate("alice", "alice", []float32{1, 0, 0}, nil, + config.VoiceGateWhenEvery, config.VoiceGateRejectEvent)) + session.voiceGate.cfg.Identity = &config.VoiceIdentityConfig{ + Personalize: true, InjectName: true, InjectSystemNote: true, + } + session.Instructions = "You are helpful." + tr := &fakeTransport{} + + commitUtterance(context.Background(), utt, session, &Conversation{}, tr) + + user := findRole(m.lastMessages, "user") + Expect(user).ToNot(BeNil()) + Expect(user.Name).To(Equal("alice")) + sys := findRole(m.lastMessages, "system") + Expect(sys).ToNot(BeNil()) + Expect(sys.StringContent).To(ContainSubstring("The current speaker is alice.")) + }) + + It("omits the unknown note unless note_unknown is set", func() { + base := func() (*Session, *fakeModel) { + gate := &voiceGate{ + cfg: config.PipelineVoiceRecognition{ + Mode: config.VoiceGateModeIdentify, Threshold: 0.25, + When: config.VoiceGateWhenEvery, OnReject: config.VoiceGateRejectEvent, + Enforce: boolPtr(false), + Identity: &config.VoiceIdentityConfig{Personalize: true, InjectSystemNote: true}, + }, + registry: &fakeRegistry{matches: []voicerecognition.Match{ + {Distance: 0.9, Metadata: voicerecognition.Metadata{Name: "alice"}}, + }}, + embedFn: func(context.Context, string) ([]float32, error) { return []float32{1, 0, 0}, nil }, + } + s, m := itSession(gate) + s.Instructions = "You are helpful." + return s, m + } + + s1, m1 := base() + commitUtterance(context.Background(), utt, s1, &Conversation{}, &fakeTransport{}) + Expect(findRole(m1.lastMessages, "system").StringContent).ToNot(ContainSubstring("unknown")) + + s2, m2 := base() + s2.voiceGate.cfg.Identity.NoteUnknown = true + commitUtterance(context.Background(), utt, s2, &Conversation{}, &fakeTransport{}) + Expect(findRole(m2.lastMessages, "system").StringContent).To(ContainSubstring("The current speaker is unknown.")) + }) +}) + +var _ = Describe("realtime when:first with identity (commitUtterance)", func() { + utt := make([]byte, 32) + + // statefulIdentityGate builds a when:first identify gate with an Identity + // block (so identity is resolved every turn) whose embedFn is driven by a + // per-turn counter: the failOnSecond flag makes the second and later embeds + // return an error, exercising the stricter fail-closed path on a re-resolve. + statefulIdentityGate := func(failOnSecond bool) *voiceGate { + calls := 0 + return &voiceGate{ + cfg: config.PipelineVoiceRecognition{ + Mode: config.VoiceGateModeIdentify, + Threshold: 0.25, + When: config.VoiceGateWhenFirst, + OnReject: config.VoiceGateRejectEvent, + Allow: config.VoiceRecognitionAllow{Names: []string{"alice"}}, + Identity: &config.VoiceIdentityConfig{Announce: true, Personalize: true, InjectName: true}, + }, + registry: &fakeRegistry{matches: []voicerecognition.Match{ + {Distance: 0.1, Metadata: voicerecognition.Metadata{Name: "alice"}}, + }}, + embedFn: func(context.Context, string) ([]float32, error) { + calls++ + if failOnSecond && calls > 1 { + return nil, errors.New("embed backend down") + } + return []float32{1, 0, 0}, nil + }, + } + } + + It("re-resolves identity every turn and fails closed when a later embed errors", func() { + gate := statefulIdentityGate(true) + session, _ := itSession(gate) + conv := &Conversation{} // shared so voiceVerified persists across turns + + // Turn 1: authorized; identity resolved, speaker surfaced, response runs. + tr1 := &fakeTransport{} + commitUtterance(context.Background(), utt, session, conv, tr1) + Expect(hasSpeakerNotAuthorized(tr1)).To(BeFalse()) + Expect(tr1.countEvents(types.ServerEventTypeConversationItemSpeaker)).To(Equal(1)) + Expect(tr1.countEvents(types.ServerEventTypeResponseDone)).To(BeNumerically(">=", 1)) + + // Turn 2: when:first would skip re-authorization, but the Identity block + // forces a fresh resolve. That resolve now errors, and because the gate + // enforces, the turn is dropped fail-closed rather than riding on the + // cached first verification. + tr2 := &fakeTransport{} + commitUtterance(context.Background(), utt, session, conv, tr2) + Expect(hasSpeakerNotAuthorized(tr2)).To(BeTrue()) + Expect(tr2.countEvents(types.ServerEventTypeResponseDone)).To(Equal(0)) + }) + + It("re-resolves identity every turn so a later turn still surfaces and names the speaker", func() { + gate := statefulIdentityGate(false) + session, m := itSession(gate) + conv := &Conversation{} + + tr1 := &fakeTransport{} + commitUtterance(context.Background(), utt, session, conv, tr1) + Expect(hasSpeakerNotAuthorized(tr1)).To(BeFalse()) + Expect(tr1.countEvents(types.ServerEventTypeResponseDone)).To(BeNumerically(">=", 1)) + + // Turn 2: authorization is skipped (when:first, already verified) but the + // speaker event still fires and the per-message name is set, proving the + // per-turn re-resolution (not the cached first verification) drove it. + tr2 := &fakeTransport{} + commitUtterance(context.Background(), utt, session, conv, tr2) + Expect(tr2.countEvents(types.ServerEventTypeConversationItemSpeaker)).To(Equal(1)) + var lastUser *schema.Message + for i := range m.lastMessages { + if m.lastMessages[i].Role == "user" { + lastUser = &m.lastMessages[i] + } + } + Expect(lastUser).ToNot(BeNil()) + Expect(lastUser.Name).To(Equal("alice")) + }) +}) + +var _ = Describe("realtime multi-speaker history attribution (triggerResponse)", func() { + userAudioItem := func(name, transcript string) *types.MessageItemUnion { + return &types.MessageItemUnion{ + User: &types.MessageItemUser{ + ID: generateItemID(), + Status: types.ItemStatusCompleted, + Speaker: &types.Speaker{Name: name, Matched: true}, + Content: []types.MessageContentInput{ + {Type: types.MessageContentTypeInputAudio, Transcript: transcript}, + }, + }, + } + } + + It("attributes each user turn to its own speaker and notes the latest one", func() { + session, m := itSession(itGate("alice", "alice", []float32{1, 0, 0}, nil, + config.VoiceGateWhenEvery, config.VoiceGateRejectEvent)) + session.Instructions = "You are helpful." + session.MaxHistoryItems = 10 // keep both items; 0 would mean "no trim" too + session.voiceGate.cfg.Identity = &config.VoiceIdentityConfig{ + Personalize: true, InjectName: true, InjectSystemNote: true, + } + + conv := &Conversation{Items: []*types.MessageItemUnion{ + userAudioItem("alice", "hello there"), + userAudioItem("bob", "what is the weather"), + }} + tr := &fakeTransport{} + + triggerResponse(context.Background(), session, conv, tr, nil) + + var users []*schema.Message + var sys *schema.Message + for i := range m.lastMessages { + switch m.lastMessages[i].Role { + case "user": + users = append(users, &m.lastMessages[i]) + case "system": + if sys == nil { + sys = &m.lastMessages[i] + } + } + } + Expect(users).To(HaveLen(2)) + Expect(users[0].Name).To(Equal("alice")) + Expect(users[1].Name).To(Equal("bob")) + + Expect(sys).ToNot(BeNil()) + Expect(sys.StringContent).To(ContainSubstring("The current speaker is bob.")) + Expect(sys.StringContent).ToNot(ContainSubstring("alice")) + }) +}) + +func boolPtr(b bool) *bool { return &b } diff --git a/core/http/endpoints/openai/realtime_voicegate_test.go b/core/http/endpoints/openai/realtime_voicegate_test.go index bdbbc2f4a..3d9b458e1 100644 --- a/core/http/endpoints/openai/realtime_voicegate_test.go +++ b/core/http/endpoints/openai/realtime_voicegate_test.go @@ -10,6 +10,82 @@ import ( . "github.com/onsi/gomega" ) +var _ = Describe("voiceGate.Resolve + authorize", func() { + mkGate := func(allow []string) *voiceGate { + return &voiceGate{ + cfg: config.PipelineVoiceRecognition{ + Mode: config.VoiceGateModeIdentify, + Threshold: 0.25, + Allow: config.VoiceRecognitionAllow{Names: allow}, + }, + registry: &fakeRegistry{matches: []voicerecognition.Match{ + {Distance: 0.1, Metadata: voicerecognition.Metadata{ID: "spk_1", Name: "alice", Labels: map[string]string{"family": "yes"}}}, + }}, + embedFn: func(context.Context, string) ([]float32, error) { return []float32{1, 0, 0}, nil }, + } + } + + It("resolves a confident identity with name, id and a 0..100 confidence", func() { + r, err := mkGate(nil).Resolve(context.Background(), "x.wav") + Expect(err).ToNot(HaveOccurred()) + Expect(r.found).To(BeTrue()) + Expect(r.speaker.Name).To(Equal("alice")) + Expect(r.speaker.ID).To(Equal("spk_1")) + Expect(r.speaker.Matched).To(BeTrue()) + Expect(r.speaker.Confidence).To(BeNumerically(">", 0)) + Expect(r.speaker.Confidence).To(BeNumerically("<=", 100)) + Expect(r.speaker.Labels).To(HaveKeyWithValue("family", "yes")) + }) + + It("marks a candidate above the threshold as not matched", func() { + g := mkGate(nil) + g.registry = &fakeRegistry{matches: []voicerecognition.Match{ + {Distance: 0.9, Metadata: voicerecognition.Metadata{Name: "alice"}}, + }} + r, err := g.Resolve(context.Background(), "x.wav") + Expect(err).ToNot(HaveOccurred()) + Expect(r.found).To(BeTrue()) + Expect(r.speaker.Matched).To(BeFalse()) + Expect(r.speaker.Name).To(Equal("alice")) // name still surfaced + }) + + It("authorize allows a confident match in the allow list", func() { + g := mkGate([]string{"alice"}) + r, _ := g.Resolve(context.Background(), "x.wav") + allowed, reason := g.authorize(r) + Expect(allowed).To(BeTrue()) + Expect(reason).To(BeEmpty()) + }) + + It("authorize denies a confident match outside the allow list", func() { + g := mkGate([]string{"bob"}) + r, _ := g.Resolve(context.Background(), "x.wav") + allowed, reason := g.authorize(r) + Expect(allowed).To(BeFalse()) + Expect(reason).To(Equal("speaker not in allow list")) + }) + + It("authorize allows by label when names do not match", func() { + g := mkGate(nil) + g.cfg.Allow = config.VoiceRecognitionAllow{Labels: []string{"family"}} + r, _ := g.Resolve(context.Background(), "x.wav") + allowed, _ := g.authorize(r) + Expect(allowed).To(BeTrue()) + }) +}) + +var _ = Describe("confidence", func() { + It("is 100 at zero distance", func() { + Expect(confidence(0, 0.25)).To(BeNumerically("~", 100, 1e-4)) + }) + It("clamps to 0 above the threshold", func() { + Expect(confidence(0.5, 0.25)).To(BeNumerically("~", 0, 1e-4)) + }) + It("is 0 for a non-positive threshold", func() { + Expect(confidence(0.1, 0)).To(BeNumerically("~", 0, 1e-4)) + }) +}) + var _ = Describe("cosineDistance", func() { It("is 0 for identical vectors", func() { Expect(cosineDistance([]float32{1, 0, 0}, []float32{1, 0, 0})).To(BeNumerically("~", 0, 1e-6)) diff --git a/core/http/endpoints/openai/realtime_webrtc.go b/core/http/endpoints/openai/realtime_webrtc.go index 0ac982c19..26edf94ea 100644 --- a/core/http/endpoints/openai/realtime_webrtc.go +++ b/core/http/endpoints/openai/realtime_webrtc.go @@ -128,10 +128,13 @@ func RealtimeCalls(application *application.Application) echo.HandlerFunc { handleIncomingAudioTrack(track, transport) }) - // Set the remote SDP (client's offer) + // Set the remote SDP (client's offer). Raise the data-channel + // max-message-size the browser advertised so pion permits the larger + // realtime events some turns produce (e.g. tool calls), which would + // otherwise be dropped on send. See realtime_webrtc_sctp.go. if err := pc.SetRemoteDescription(webrtc.SessionDescription{ Type: webrtc.SDPTypeOffer, - SDP: req.SDP, + SDP: raiseDataChannelMaxMessageSize(req.SDP), }); err != nil { transport.Close() xlog.Error("failed to set remote description", "error", err) diff --git a/core/http/endpoints/openai/realtime_webrtc_sctp.go b/core/http/endpoints/openai/realtime_webrtc_sctp.go new file mode 100644 index 000000000..b0355ba70 --- /dev/null +++ b/core/http/endpoints/openai/realtime_webrtc_sctp.go @@ -0,0 +1,29 @@ +package openai + +import ( + "fmt" + "regexp" +) + +// realtimeDataChannelMaxMessageSize is the SCTP max-message-size LocalAI honors +// for the "oai-events" data channel, in bytes. +// +// Browsers advertise a conservative max-message-size in their SDP offer (Chrome +// uses 262144 = 256 KiB). pion enforces the remote's advertised value on send, +// so a single realtime event larger than it cannot be sent: the SendText fails, +// the event is dropped, and the turn silently yields no response. Some turns +// legitimately produce a single JSON event above 256 KiB (notably tool calls +// with sizeable schemas or results). Browsers advertise this value +// conservatively but their SCTP stacks reassemble much larger messages, so we +// raise the value honored for our own server-generated events. +const realtimeDataChannelMaxMessageSize = 16 * 1024 * 1024 // 16 MiB + +var maxMessageSizeAttrRe = regexp.MustCompile(`a=max-message-size:\d+`) + +// raiseDataChannelMaxMessageSize rewrites the SCTP max-message-size attribute in +// an SDP offer to realtimeDataChannelMaxMessageSize so pion permits larger +// outbound realtime events. Offers that don't carry the attribute are returned +// unchanged. +func raiseDataChannelMaxMessageSize(sdp string) string { + return maxMessageSizeAttrRe.ReplaceAllString(sdp, fmt.Sprintf("a=max-message-size:%d", realtimeDataChannelMaxMessageSize)) +} diff --git a/core/http/endpoints/openai/realtime_webrtc_sctp_test.go b/core/http/endpoints/openai/realtime_webrtc_sctp_test.go new file mode 100644 index 000000000..92da4e706 --- /dev/null +++ b/core/http/endpoints/openai/realtime_webrtc_sctp_test.go @@ -0,0 +1,33 @@ +package openai + +import ( + "fmt" + "strings" + + . "github.com/onsi/ginkgo/v2" + . "github.com/onsi/gomega" +) + +var _ = Describe("raiseDataChannelMaxMessageSize", func() { + It("raises a max-message-size the browser advertised", func() { + offer := "v=0\r\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\r\na=max-message-size:262144\r\n" + out := raiseDataChannelMaxMessageSize(offer) + Expect(out).To(ContainSubstring(fmt.Sprintf("a=max-message-size:%d", realtimeDataChannelMaxMessageSize))) + Expect(out).NotTo(ContainSubstring("a=max-message-size:262144")) + }) + + It("leaves an offer without the attribute unchanged", func() { + offer := "v=0\r\nm=application 9 UDP/DTLS/SCTP webrtc-datachannel\r\n" + Expect(raiseDataChannelMaxMessageSize(offer)).To(Equal(offer)) + }) + + It("rewrites every occurrence", func() { + offer := "a=max-message-size:1024\r\na=max-message-size:262144\r\n" + out := raiseDataChannelMaxMessageSize(offer) + Expect(strings.Count(out, fmt.Sprintf("a=max-message-size:%d", realtimeDataChannelMaxMessageSize))).To(Equal(2)) + }) + + It("raises above the 256 KiB browsers advertise", func() { + Expect(realtimeDataChannelMaxMessageSize).To(BeNumerically(">", 262144)) + }) +}) diff --git a/core/http/endpoints/openai/sound_classification.go b/core/http/endpoints/openai/sound_classification.go new file mode 100644 index 000000000..b7e23f1b1 --- /dev/null +++ b/core/http/endpoints/openai/sound_classification.go @@ -0,0 +1,91 @@ +package openai + +import ( + "io" + "net/http" + "os" + "path" + "path/filepath" + + "github.com/labstack/echo/v4" + "github.com/mudler/LocalAI/core/backend" + "github.com/mudler/LocalAI/core/config" + "github.com/mudler/LocalAI/core/http/middleware" + "github.com/mudler/LocalAI/core/schema" + model "github.com/mudler/LocalAI/pkg/model" + + "github.com/mudler/xlog" +) + +// SoundClassificationEndpoint runs an audio-tagging / sound-event +// classification model (e.g. ced) over an uploaded clip and returns the +// scored AudioSet tags in score-descending order. It mirrors the +// transcription path: multipart audio upload -> temp file -> backend call. +// +// @Summary Classify sound events in audio (audio tagging). +// @Tags audio +// @accept multipart/form-data +// @Param model formData string true "model" +// @Param file formData file true "audio file" +// @Param top_k formData int false "number of top tags to return (0 = backend default)" +// @Param threshold formData number false "drop tags scoring below this value" +// @Success 200 {object} schema.SoundClassificationResult +// @Router /v1/audio/classification [post] +func SoundClassificationEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc { + return func(c echo.Context) error { + input, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_LOCALAI_REQUEST).(*schema.OpenAIRequest) + if !ok || input.Model == "" { + return echo.ErrBadRequest + } + + modelConfig, ok := c.Get(middleware.CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig) + if !ok || modelConfig == nil { + return echo.ErrBadRequest + } + + req := backend.SoundDetectionRequest{ + TopK: int32(parseFormInt(c, "top_k", 0)), + Threshold: float32(parseFormFloat(c, "threshold", 0)), + } + + file, err := c.FormFile("file") + if err != nil { + return err + } + f, err := file.Open() + if err != nil { + return err + } + defer func() { _ = f.Close() }() + + dir, err := os.MkdirTemp("", "sound-classification") + if err != nil { + return err + } + defer func() { _ = os.RemoveAll(dir) }() + + dst := filepath.Join(dir, path.Base(file.Filename)) + dstFile, err := os.Create(dst) // #nosec G304 -- dst is a server-created temp dir joined with path.Base of the upload name (no traversal) + if err != nil { + return err + } + if _, err := io.Copy(dstFile, f); err != nil { + xlog.Debug("Audio file copying error", "filename", file.Filename, "dst", dst, "error", err) + _ = dstFile.Close() + return err + } + _ = dstFile.Close() + req.Audio = dst + + result, err := backend.ModelSoundDetection(c.Request().Context(), req, ml, *modelConfig, appConfig) + if err != nil { + xlog.Error("Sound classification failed", + "model", modelConfig.Name, + "audio", dst, + "error", err) + return err + } + + return c.JSON(http.StatusOK, result) + } +} diff --git a/core/http/endpoints/openai/types/message_item.go b/core/http/endpoints/openai/types/message_item.go index 52997fe8c..88d680648 100644 --- a/core/http/endpoints/openai/types/message_item.go +++ b/core/http/endpoints/openai/types/message_item.go @@ -102,6 +102,10 @@ type MessageItemUser struct { // The status of the item. Has no effect on the conversation. Status ItemStatus `json:"status,omitempty"` + + // Speaker is the recognized speaker for this audio turn (LocalAI extension). + // Used to attribute past turns when rebuilding the LLM message history. + Speaker *Speaker `json:"speaker,omitempty"` } func (m MessageItemUser) MessageItemType() MessageItemType { diff --git a/core/http/endpoints/openai/types/server_events.go b/core/http/endpoints/openai/types/server_events.go index bae680fd5..6b0a233ee 100644 --- a/core/http/endpoints/openai/types/server_events.go +++ b/core/http/endpoints/openai/types/server_events.go @@ -18,8 +18,12 @@ const ( ServerEventTypeConversationItemInputAudioTranscriptionDelta ServerEventType = "conversation.item.input_audio_transcription.delta" ServerEventTypeConversationItemInputAudioTranscriptionSegment ServerEventType = "conversation.item.input_audio_transcription.segment" ServerEventTypeConversationItemInputAudioTranscriptionFailed ServerEventType = "conversation.item.input_audio_transcription.failed" + ServerEventTypeConversationItemSoundDetection ServerEventType = "conversation.item.sound_detection" ServerEventTypeConversationItemTruncated ServerEventType = "conversation.item.truncated" ServerEventTypeConversationItemDeleted ServerEventType = "conversation.item.deleted" + // ServerEventTypeConversationItemSpeaker is a LocalAI extension: it reports + // the recognized speaker for a user audio item. OpenAI clients ignore it. + ServerEventTypeConversationItemSpeaker ServerEventType = "conversation.item.speaker" ServerEventTypeInputAudioBufferCommitted ServerEventType = "input_audio_buffer.committed" ServerEventTypeInputAudioBufferCleared ServerEventType = "input_audio_buffer.cleared" ServerEventTypeInputAudioBufferSpeechStarted ServerEventType = "input_audio_buffer.speech_started" @@ -335,6 +339,33 @@ func (m ConversationItemAddedEvent) MarshalJSON() ([]byte, error) { return json.Marshal(shadow) } +// ConversationItemSpeakerEvent reports the recognized speaker for a user audio +// item. LocalAI extension; not part of the OpenAI Realtime API. +type ConversationItemSpeakerEvent struct { + ServerEventBase + // ItemID is the conversation item this speaker belongs to. + ItemID string `json:"item_id"` + // Speaker is the recognized identity. + Speaker Speaker `json:"speaker"` +} + +func (m ConversationItemSpeakerEvent) ServerEventType() ServerEventType { + return ServerEventTypeConversationItemSpeaker +} + +func (m ConversationItemSpeakerEvent) MarshalJSON() ([]byte, error) { + type typeAlias ConversationItemSpeakerEvent + type typeWrapper struct { + typeAlias + Type ServerEventType `json:"type"` + } + shadow := typeWrapper{ + typeAlias: typeAlias(m), + Type: m.ServerEventType(), + } + return json.Marshal(shadow) +} + // Returned when a conversation item is finalized. // // The event will include the full content of the Item except for audio data, which can be retrieved separately with a `conversation.item.retrieve` event if needed. @@ -443,6 +474,55 @@ func (m ConversationItemInputAudioTranscriptionCompletedEvent) MarshalJSON() ([] return json.Marshal(shadow) } +// SoundDetectionTag is one scored sound-event tag from the sound-event +// classifier. Label is the human-readable AudioSet class name, Score is the +// per-class probability (multi-label, independent), and Index is the class +// index in the model ontology. +type SoundDetectionTag struct { + // The human-readable AudioSet class name (e.g. "Baby cry, infant cry"). + Label string `json:"label"` + + // The per-class probability for this tag. + Score float32 `json:"score"` + + // The class index in the model ontology. + Index int `json:"index"` +} + +// Returned when a committed input audio window has been classified by a +// sound-event-detection model. This is a LocalAI extension to the OpenAI +// Realtime API: when a pipeline configures sound_detection, each VAD-committed +// utterance is run through the classifier and the scored AudioSet tags are +// emitted as this event, independent of (and alongside) transcription. +type ConversationItemSoundDetectionEvent struct { + ServerEventBase + // The ID of the item. + ItemID string `json:"item_id"` + + // The index of the content part in the item's content array. + ContentIndex int `json:"content_index"` + + // The scored sound-event tags, in score-descending order. + Detections []SoundDetectionTag `json:"detections"` +} + +func (m ConversationItemSoundDetectionEvent) ServerEventType() ServerEventType { + return ServerEventTypeConversationItemSoundDetection +} + +func (m ConversationItemSoundDetectionEvent) MarshalJSON() ([]byte, error) { + type typeAlias ConversationItemSoundDetectionEvent + type typeWrapper struct { + typeAlias + Type ServerEventType `json:"type"` + } + shadow := typeWrapper{ + typeAlias: typeAlias(m), + Type: m.ServerEventType(), + } + return json.Marshal(shadow) +} + // Returned when the text value of an input audio transcription content part is updated with incremental transcription results. // // See https://platform.openai.com/docs/api-reference/realtime-server-events/conversation/item/input_audio_transcription/delta diff --git a/core/http/endpoints/openai/types/speaker.go b/core/http/endpoints/openai/types/speaker.go new file mode 100644 index 000000000..a5b02c927 --- /dev/null +++ b/core/http/endpoints/openai/types/speaker.go @@ -0,0 +1,14 @@ +package types + +// Speaker is the recognized speaker for a committed audio turn. It is a LocalAI +// extension to the OpenAI Realtime schema, carried on the user conversation item +// and surfaced via the conversation.item.speaker event. Confidence is a 0..100 +// score relative to the match threshold (same formula as /v1/voice/identify). +type Speaker struct { + Name string `json:"name,omitempty"` + ID string `json:"id,omitempty"` + Labels map[string]string `json:"labels,omitempty"` + Confidence float32 `json:"confidence"` + Distance float32 `json:"distance"` + Matched bool `json:"matched"` +} diff --git a/core/http/middleware/baseurl.go b/core/http/middleware/baseurl.go index a1e1844ae..84f72cf69 100644 --- a/core/http/middleware/baseurl.go +++ b/core/http/middleware/baseurl.go @@ -55,17 +55,70 @@ func BasePathPrefix(c echo.Context) string { // The returned URL is guaranteed to end with `/`. // The method should be used in conjunction with the StripPathPrefix middleware. func BaseURL(c echo.Context) string { + // An explicit external base URL (LOCALAI_BASE_URL) is authoritative for + // the origin. The proxy-derived path prefix is still appended so a + // reverse-proxy mount point keeps working. Trailing slashes are + // normalized via BasePathPrefix, which always starts and ends with "/". + if ext, ok := c.Get("_external_base_url").(string); ok && ext != "" { + return strings.TrimRight(ext, "/") + BasePathPrefix(c) + } + + fwdProto, fwdHost := parseForwarded(c.Request().Header.Get("Forwarded")) + scheme := "http" - if c.Request().Header.Get("X-Forwarded-Proto") == "https" { + switch { + case c.Request().TLS != nil: scheme = "https" - } else if c.Request().TLS != nil { + case strings.EqualFold(firstToken(c.Request().Header.Get("X-Forwarded-Proto")), "https"): + scheme = "https" + case strings.EqualFold(fwdProto, "https"): scheme = "https" } host := c.Request().Host if forwardedHost := c.Request().Header.Get("X-Forwarded-Host"); forwardedHost != "" { host = forwardedHost + } else if fwdHost != "" { + host = fwdHost } return scheme + "://" + host + BasePathPrefix(c) } + +// firstToken returns the first comma-separated token of v, trimmed of spaces. +// Reverse-proxy chains can emit X-Forwarded-Proto as "https,http"; only the +// first hop (closest to the client) is meaningful for scheme detection. +func firstToken(v string) string { + if i := strings.IndexByte(v, ','); i >= 0 { + v = v[:i] + } + return strings.TrimSpace(v) +} + +// parseForwarded extracts the proto and host directives from the first element +// of an RFC 7239 Forwarded header (e.g. `for=x;proto=https;host=h, for=y`). +// Values may be quoted. Returns empty strings when absent or malformed so the +// caller can fall through to other signals. +func parseForwarded(header string) (proto, host string) { + if header == "" { + return "", "" + } + // Only the first element (closest proxy to the client) matters here. + if i := strings.IndexByte(header, ','); i >= 0 { + header = header[:i] + } + for _, directive := range strings.Split(header, ";") { + key, value, ok := strings.Cut(strings.TrimSpace(directive), "=") + if !ok { + continue + } + value = strings.Trim(strings.TrimSpace(value), `"`) + switch strings.ToLower(strings.TrimSpace(key)) { + case "proto": + proto = value + case "host": + host = value + } + } + return proto, host +} diff --git a/core/http/middleware/baseurl_test.go b/core/http/middleware/baseurl_test.go index 4f6dbb1d1..6a132514b 100644 --- a/core/http/middleware/baseurl_test.go +++ b/core/http/middleware/baseurl_test.go @@ -135,4 +135,138 @@ var _ = Describe("BaseURL", func() { Entry("missing leading slash", "evil"), ) }) + + Context("scheme detection hardening", func() { + It("treats comma-separated X-Forwarded-Proto as https when first token is https", func() { + app := echo.New() + actualURL := "" + app.GET("/x", func(c echo.Context) error { + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/x", nil) + req.Header.Set("X-Forwarded-Proto", "https,http") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("https://example.com/")) + }) + + It("derives https from the RFC 7239 Forwarded proto directive", func() { + app := echo.New() + actualURL := "" + app.GET("/x", func(c echo.Context) error { + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/x", nil) + req.Header.Set("Forwarded", "for=192.0.2.1;proto=https;host=proxy.example") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("https://proxy.example/")) + }) + + It("prefers X-Forwarded-Host over the Forwarded host directive", func() { + app := echo.New() + actualURL := "" + app.GET("/x", func(c echo.Context) error { + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/x", nil) + req.Header.Set("X-Forwarded-Host", "xfh.example") + req.Header.Set("Forwarded", "host=fwd.example;proto=https") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("https://xfh.example/")) + }) + }) + + Context("explicit external base URL override", func() { + It("uses the configured origin over conflicting forwarded headers", func() { + app := echo.New() + actualURL := "" + app.GET("/x", func(c echo.Context) error { + c.Set("_external_base_url", "https://192.168.0.13:34567") + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/x", nil) + req.Header.Set("X-Forwarded-Proto", "http") + req.Header.Set("X-Forwarded-Host", "internal:8080") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("https://192.168.0.13:34567/")) + }) + + It("combines the configured origin with a detected path prefix", func() { + app := echo.New() + actualURL := "" + app.GET("/hello", func(c echo.Context) error { + c.Set("_original_path", "/localai/hello") + c.Set("_external_base_url", "https://ext.example") + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/hello", nil) + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("https://ext.example/localai/")) + }) + + It("ignores an empty override", func() { + app := echo.New() + actualURL := "" + app.GET("/x", func(c echo.Context) error { + c.Set("_external_base_url", "") + actualURL = BaseURL(c) + return nil + }) + req := httptest.NewRequest("GET", "/x", nil) + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + Expect(actualURL).To(Equal("http://example.com/")) + }) + }) + + Context("parseForwarded helper", func() { + It("parses unquoted proto and host", func() { + proto, host := parseForwarded("for=192.0.2.1;proto=https;host=h.example") + Expect(proto).To(Equal("https")) + Expect(host).To(Equal("h.example")) + }) + + It("strips quotes around values", func() { + proto, host := parseForwarded(`proto="https";host="h.example"`) + Expect(proto).To(Equal("https")) + Expect(host).To(Equal("h.example")) + }) + + It("uses only the first element of a multi-element header", func() { + proto, host := parseForwarded("proto=https;host=first.example, proto=http;host=second.example") + Expect(proto).To(Equal("https")) + Expect(host).To(Equal("first.example")) + }) + + It("returns empty strings for an empty header", func() { + proto, host := parseForwarded("") + Expect(proto).To(BeEmpty()) + Expect(host).To(BeEmpty()) + }) + + It("skips directives without a value", func() { + proto, host := parseForwarded("proto;host=h.example") + Expect(proto).To(BeEmpty()) + Expect(host).To(Equal("h.example")) + }) + }) + + Context("firstToken helper", func() { + It("returns the whole trimmed string when there is no comma", func() { + Expect(firstToken(" https ")).To(Equal("https")) + }) + + It("returns the first trimmed token when there is a comma", func() { + Expect(firstToken("https , http")).To(Equal("https")) + }) + }) }) diff --git a/core/http/middleware/request.go b/core/http/middleware/request.go index ff0d929ac..74f7e8565 100644 --- a/core/http/middleware/request.go +++ b/core/http/middleware/request.go @@ -167,6 +167,27 @@ func (re *RequestExtractor) SetModelAndConfig(initializer func() schema.LocalAIR } } + // Resolve a model alias to its target before the disabled check and + // before storing MODEL_CONFIG, so every modality (chat, embeddings, + // tts, image, ...) inherits redirection. The response keeps echoing + // the alias name (input.ModelName is left unchanged); usage accounting + // records requested=alias / served=target. + if cfg != nil && cfg.IsAlias() { + resolved, _, aliasErr := re.modelConfigLoader.ResolveAlias(cfg) + if aliasErr != nil { + return c.JSON(http.StatusBadRequest, schema.ErrorResponse{ + Error: &schema.APIError{ + Message: aliasErr.Error(), + Code: http.StatusBadRequest, + Type: "invalid_request_error", + }, + }) + } + c.Set(ContextKeyRequestedModel, modelName) + c.Set(ContextKeyServedModel, resolved.Name) + cfg = resolved + } + // Check if the model is disabled if cfg != nil && cfg.IsDisabled() { return c.JSON(http.StatusForbidden, schema.ErrorResponse{ diff --git a/core/http/middleware/request_test.go b/core/http/middleware/request_test.go index fe9fc926c..010379714 100644 --- a/core/http/middleware/request_test.go +++ b/core/http/middleware/request_test.go @@ -151,6 +151,107 @@ var _ = Describe("SetModelAndConfig middleware", func() { }) }) +// --------------------------------------------------------------------------- +// SetModelAndConfig - model alias resolution +// --------------------------------------------------------------------------- +// +// An alias config (`alias: `) is a pure redirect: the middleware must +// swap MODEL_CONFIG to the target config before the disabled check and before +// storing it, while leaving the response-facing model name as the alias. It +// also stamps routing.requested_model = alias and routing.served_model = +// target so usage accounting records both identities. +var _ = Describe("SetModelAndConfig alias resolution", func() { + var ( + modelDir string + capturedConfig *config.ModelConfig + capturedReq any + capturedServed any + app *echo.Echo + ) + + BeforeEach(func() { + var err error + modelDir, err = os.MkdirTemp("", "localai-alias-*") + Expect(err).ToNot(HaveOccurred()) + }) + + AfterEach(func() { + _ = os.RemoveAll(modelDir) + }) + + // buildApp seeds the loader from every YAML in modelDir (so an alias's + // target is present in the loader map) and wires a handler that captures + // the resolved config plus the stamped identity keys. + buildApp := func() *echo.Echo { + ss := &system.SystemState{Model: system.Model{ModelsPath: modelDir}} + appConfig := config.NewApplicationConfig() + appConfig.SystemState = ss + + mcl := config.NewModelConfigLoader(modelDir) + Expect(mcl.LoadModelConfigsFromPath(modelDir)).To(Succeed()) + ml := model.NewModelLoader(ss) + re := NewRequestExtractor(mcl, ml, appConfig) + + capturedConfig = nil + capturedReq = nil + capturedServed = nil + e := echo.New() + e.POST("/v1/chat/completions", + func(c echo.Context) error { + if cfg, ok := c.Get(CONTEXT_LOCALS_KEY_MODEL_CONFIG).(*config.ModelConfig); ok { + capturedConfig = cfg + } + capturedReq = c.Get(ContextKeyRequestedModel) + capturedServed = c.Get(ContextKeyServedModel) + return c.String(http.StatusOK, "ok") + }, + re.SetModelAndConfig(func() schema.LocalAIRequest { return new(schema.OpenAIRequest) }), + ) + return e + } + + It("serves the target config but keeps the alias name and stamps identity", func() { + Expect(os.WriteFile(filepath.Join(modelDir, "real.yaml"), + []byte("name: real\nbackend: llama-cpp\n"), 0644)).To(Succeed()) + Expect(os.WriteFile(filepath.Join(modelDir, "gpt-4.yaml"), + []byte("name: gpt-4\nalias: real\n"), 0644)).To(Succeed()) + app = buildApp() + + req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", + strings.NewReader(`{"model":"gpt-4","messages":[{"role":"user","content":"hi"}]}`)) + req.Header.Set("Content-Type", "application/json") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + + Expect(rec.Code).To(Equal(http.StatusOK)) + Expect(capturedConfig).ToNot(BeNil()) + // MODEL_CONFIG must be the target, not the alias stub. + Expect(capturedConfig.Name).To(Equal("real")) + Expect(capturedConfig.IsAlias()).To(BeFalse()) + // Identity stamps: requested = alias, served = target. + Expect(capturedReq).To(Equal("gpt-4")) + Expect(capturedServed).To(Equal("real")) + }) + + It("returns 400 when the alias target is missing", func() { + Expect(os.WriteFile(filepath.Join(modelDir, "gpt-4.yaml"), + []byte("name: gpt-4\nalias: nope\n"), 0644)).To(Succeed()) + app = buildApp() + + req := httptest.NewRequest(http.MethodPost, "/v1/chat/completions", + strings.NewReader(`{"model":"gpt-4","messages":[{"role":"user","content":"hi"}]}`)) + req.Header.Set("Content-Type", "application/json") + rec := httptest.NewRecorder() + app.ServeHTTP(rec, req) + + Expect(rec.Code).To(Equal(http.StatusBadRequest)) + var resp schema.ErrorResponse + Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed()) + Expect(resp.Error).ToNot(BeNil()) + Expect(resp.Error.Type).To(Equal("invalid_request_error")) + }) +}) + // --------------------------------------------------------------------------- // MergeOpenResponsesConfig — tool_choice parsing // --------------------------------------------------------------------------- diff --git a/core/http/middleware/route_model.go b/core/http/middleware/route_model.go index 7ff286af4..470bd05f5 100644 --- a/core/http/middleware/route_model.go +++ b/core/http/middleware/route_model.go @@ -189,7 +189,12 @@ func RouteModel(loader *config.ModelConfigLoader, appConfig *config.ApplicationC } c.Set(CONTEXT_LOCALS_KEY_MODEL_CONFIG, result.ChosenConfig) - c.Set(ContextKeyRequestedModel, result.RouterModel) + // Preserve an upstream requested model (e.g. an alias that points + // at this router model) so accounting keeps the name the client + // actually sent. Served always reflects the final candidate. + if c.Get(ContextKeyRequestedModel) == nil { + c.Set(ContextKeyRequestedModel, result.RouterModel) + } c.Set(ContextKeyServedModel, result.ChosenModel) if store != nil { diff --git a/core/http/react-ui/e2e/alias-template.spec.js b/core/http/react-ui/e2e/alias-template.spec.js new file mode 100644 index 000000000..f3b1a0ca0 --- /dev/null +++ b/core/http/react-ui/e2e/alias-template.spec.js @@ -0,0 +1,77 @@ +import { test, expect } from './coverage-fixtures.js' + +// Alias / Routing template + Manage alias badge regression tests. +// +// An alias is a model config with `alias: ` that redirects traffic to +// the target model. This covers the two discoverability surfaces: +// - the create-flow template gallery exposes an "Alias / Routing" card that +// seeds a minimal name + alias config +// - the Manage Models tab renders a read-only "alias -> target" badge on +// rows that resolve to an alias (looked up via GET /api/aliases, since the +// capabilities row payload doesn't carry the alias field) + +// Minimal metadata so the editor renders the alias field once the template +// loads. Mirrors the Task 7 config-meta registry, which surfaces `alias` as a +// model-select component. +const ALIAS_METADATA = { + sections: [ + { id: 'general', label: 'General', icon: 'settings', order: 0 }, + { id: 'other', label: 'Other', icon: 'more-horizontal', order: 100 }, + ], + fields: [ + { path: 'name', yaml_key: 'name', go_type: 'string', ui_type: 'string', + section: 'general', label: 'Model Name', component: 'input', order: 0 }, + { path: 'alias', yaml_key: 'alias', go_type: 'string', ui_type: 'string', + section: 'general', label: 'Alias', component: 'model-select', autocomplete_provider: 'models', + description: 'Redirect this model name to another configured model.', order: 1 }, + ], +} + +test.describe('Alias template - create flow', () => { + test.beforeEach(async ({ page }) => { + await page.route('**/api/auth/status', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify({ authEnabled: false, staticApiKeyRequired: false, providers: [] }) })) + await page.route('**/api/models/config-metadata*', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify(ALIAS_METADATA) })) + await page.route('**/api/models/config-metadata/autocomplete/**', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify({ values: [] }) })) + + page.on('pageerror', (err) => { + throw new Error(`uncaught page error: ${err.message}`) + }) + }) + + test('template gallery exposes the Alias / Routing card', async ({ page }) => { + await page.goto('/app/model-editor') + await expect(page.getByRole('button', { name: /Alias \/ Routing/i })).toBeVisible({ timeout: 10_000 }) + }) + + test('alias template loads the editor with the alias field', async ({ page }) => { + await page.goto('/app/model-editor?template=alias') + await expect(page.getByText(/Unexpected Application Error/i)).toHaveCount(0) + await expect(page.locator('h1.page-title')).toBeVisible({ timeout: 10_000 }) + await expect(page.getByText('Alias').first()).toBeVisible() + }) +}) + +test.describe('Manage - alias badge', () => { + test.beforeEach(async ({ page }) => { + await page.route('**/api/auth/status', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify({ authEnabled: false, staticApiKeyRequired: false, providers: [] }) })) + await page.route('**/api/models/capabilities', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify({ data: [ + { id: 'fast-llm', capabilities: ['chat'], backend: 'llama-cpp' }, + { id: 'gpt-4', capabilities: ['chat'], backend: 'llama-cpp' }, + ] }) })) + await page.route('**/api/aliases', (route) => + route.fulfill({ contentType: 'application/json', body: JSON.stringify([{ name: 'gpt-4', target: 'fast-llm' }]) })) + }) + + test('renders a read-only alias -> target badge on aliased rows', async ({ page }) => { + await page.goto('/app/manage') + await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 }) + + // The aliased row shows the target; the plain model row does not. + await expect(page.getByText('alias -> fast-llm')).toBeVisible({ timeout: 10_000 }) + }) +}) diff --git a/core/http/react-ui/e2e/manage-action-menu-position.spec.js b/core/http/react-ui/e2e/manage-action-menu-position.spec.js new file mode 100644 index 000000000..3f4301abe --- /dev/null +++ b/core/http/react-ui/e2e/manage-action-menu-position.spec.js @@ -0,0 +1,50 @@ +import { test, expect } from './coverage-fixtures.js' + +// Regression: opening a row's kebab (ActionMenu) on /app/manage used to snap +// the page scroll to the top and render the menu detached from its trigger, +// making it impossible to operate. Two causes: the menu auto-focus scrolled +// the page (no preventScroll), and the position:fixed popover was rendered +// inside a row whose hover `transform` re-anchored it. Fix portals the popover +// to document.body, positions it before paint, and focuses without scrolling. +test.describe('Manage Page - Action menu positioning', () => { + test('opening a row menu keeps scroll stable and places the menu by its trigger', async ({ page }) => { + // Small viewport so the page is scrollable and a scroll jump is observable. + await page.setViewportSize({ width: 1024, height: 500 }) + await page.goto('/app/manage') + await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 }) + + const trigger = page.locator('button.action-menu__trigger').first() + await expect(trigger).toBeVisible() + + // Bring the trigger into view ourselves first, so the only scroll we then + // measure is the one the menu would (wrongly) cause - not Playwright's own + // scroll-into-view before the click. + await trigger.scrollIntoViewIfNeeded() + const scrollBefore = await page.evaluate(() => window.scrollY) + await trigger.click() + + const menu = page.locator('[role="menu"]') + await expect(menu).toBeVisible() + + // Behavioural symptom 1: focusing the menu must not yank the page scroll. + const scrollAfter = await page.evaluate(() => window.scrollY) + expect(scrollAfter).toBe(scrollBefore) + + // Behavioural symptom 2: the menu must sit next to its trigger, not float + // at the top of the window where it can't be operated. + const triggerBox = await trigger.boundingBox() + const menuBox = await menu.boundingBox() + expect(triggerBox).not.toBeNull() + expect(menuBox).not.toBeNull() + // Menu top is within ~24px of the trigger's bottom (below) or above it + // (flipped) — in all cases it tracks the trigger, never floating at y≈0. + const tracksTrigger = + Math.abs(menuBox.y - (triggerBox.y + triggerBox.height)) < 24 || + Math.abs((menuBox.y + menuBox.height) - triggerBox.y) < 24 + expect(tracksTrigger).toBe(true) + + // Mechanism: the popover must be portaled to document.body so position:fixed + // resolves against the viewport, not a transformed ancestor row. + await expect(page.locator('body > .popover')).toHaveCount(1) + }) +}) diff --git a/core/http/react-ui/e2e/model-config.spec.js b/core/http/react-ui/e2e/model-config.spec.js index 2d7f0f8bd..96a73b543 100644 --- a/core/http/react-ui/e2e/model-config.spec.js +++ b/core/http/react-ui/e2e/model-config.spec.js @@ -288,6 +288,21 @@ test.describe('Model Editor - Interactive Tab', () => { await expect(page.locator('input[placeholder^="match,"]')).toBeVisible() }) + test('pattern min_len clamps a directly-typed negative to 0', async ({ page }) => { + const searchInput = page.locator('input[placeholder="Search fields to add..."]') + await searchInput.fill('Custom Secret Patterns') + const dropdown = searchInput.locator('..').locator('..') + await dropdown.locator('div', { hasText: 'Custom Secret Patterns' }).first().click() + + await page.locator('button', { hasText: 'Add pattern' }).click() + // The number input's min={0} only limits the spinner arrows, not keyboard + // entry; the editor must sanitise a typed negative so a meaningless + // negative length floor never reaches the saved config. + const minLen = page.locator('input[aria-label="Minimum length"]') + await minLen.fill('-5') + await expect(minLen).toHaveValue('0') + }) + // Regression: a map-typed field (entity_actions) present in the loaded YAML // must render WITH its values. flattenConfig used to recurse into the map, // scattering it across pii_detection.entity_actions. paths that match @@ -329,4 +344,37 @@ test.describe('Model Editor - Interactive Tab', () => { await expect(page.getByText(/block —/i).first()).toBeVisible() }) + // A map cannot hold two values for one key, so renaming a row to an existing + // group must collapse to a single row (Object.fromEntries, last write wins) + // rather than rendering two conflicting rows that silently lose one on save. + test('entity_actions collapses a duplicate group to a single row', async ({ page }) => { + await page.route('**/api/models/edit/ner-model', (route) => { + route.fulfill({ + contentType: 'application/json', + body: JSON.stringify({ + name: 'ner-model', + config: [ + 'name: ner-model', + 'backend: llama-cpp', + 'pii_detection:', + ' entity_actions:', + ' SSN: block', + ' EMAIL: mask', + '', + ].join('\n'), + }), + }) + }) + + await page.goto('/app/model-editor/ner-model') + + const groupInputs = page.locator('input[aria-label="Entity group"]') + await expect(groupInputs).toHaveCount(2) + + // Rename the EMAIL row to duplicate SSN; the editor collapses to one SSN row. + await groupInputs.nth(1).fill('SSN') + await expect(groupInputs).toHaveCount(1) + await expect(groupInputs.nth(0)).toHaveValue('SSN') + }) + }) diff --git a/core/http/react-ui/e2e/nodes-detail.spec.js b/core/http/react-ui/e2e/nodes-detail.spec.js new file mode 100644 index 000000000..65690ba49 --- /dev/null +++ b/core/http/react-ui/e2e/nodes-detail.spec.js @@ -0,0 +1,34 @@ +import { test, expect } from './coverage-fixtures.js' + +const ID = 'n1' +async function mockNode(page) { + await page.route(`**/api/nodes/${ID}`, r => r.fulfill({ status: 200, contentType: 'application/json', + body: JSON.stringify({ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy', total_vram: 24e9, available_vram: 12e9, max_replicas_per_model: 1, labels: { env: 'prod' } }) })) + await page.route(`**/api/nodes/${ID}/models`, r => r.fulfill({ status: 200, contentType: 'application/json', + body: JSON.stringify([{ node_id: ID, model_name: 'llama-3.3', state: 'loaded', in_flight: 0, replica_index: 0 }]) })) + await page.route(`**/api/nodes/${ID}/backends`, r => r.fulfill({ status: 200, contentType: 'application/json', + body: JSON.stringify([{ name: 'llama-cpp', is_system: true, installed_at: '2026-06-01T00:00:00Z' }]) })) +} + +test.describe('Node detail page', () => { + test('renders sections for a node', async ({ page }) => { + await mockNode(page) + await page.goto(`/app/nodes/${ID}`) + await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 }) + await expect(page.getByText('alpha')).toBeVisible() + await expect(page.getByText('llama-3.3')).toBeVisible() + await expect(page.getByText('llama-cpp')).toBeVisible() + await expect(page.getByText('env=prod')).toBeVisible() + }) + + test('is reachable by clicking a roster panel', async ({ page }) => { + await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', + body: JSON.stringify([{ id: ID, name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' }]) })) + await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' })) + await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' })) + await mockNode(page) + await page.goto('/app/nodes') + await page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('alpha').click() + await expect(page).toHaveURL(new RegExp(`/app/nodes/${ID}$`)) + }) +}) diff --git a/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js b/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js index 76855437f..9ad92932c 100644 --- a/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js +++ b/core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js @@ -12,28 +12,37 @@ const NODE_NAME = 'worker-test' const BACKEND_NAME = 'cuda12-vllm-development' async function mockDistributedNodes(page, { onDelete } = {}) { + const nodeRecord = { + id: NODE_ID, + name: NODE_NAME, + node_type: 'backend', + address: '10.0.0.1:50051', + http_address: '10.0.0.1:8090', + status: 'healthy', + total_vram: 0, + available_vram: 0, + total_ram: 8_000_000_000, + available_ram: 4_000_000_000, + gpu_vendor: '', + last_heartbeat: new Date().toISOString(), + created_at: new Date().toISOString(), + updated_at: new Date().toISOString(), + } + await page.route('**/api/nodes', (route) => { route.fulfill({ status: 200, contentType: 'application/json', - body: JSON.stringify([ - { - id: NODE_ID, - name: NODE_NAME, - node_type: 'backend', - address: '10.0.0.1:50051', - http_address: '10.0.0.1:8090', - status: 'healthy', - total_vram: 0, - available_vram: 0, - total_ram: 8_000_000_000, - available_ram: 4_000_000_000, - gpu_vendor: '', - last_heartbeat: new Date().toISOString(), - created_at: new Date().toISOString(), - updated_at: new Date().toISOString(), - }, - ]), + body: JSON.stringify([nodeRecord]), + }) + }) + + // The detail page fetches the single node via nodesApi.get(id). + await page.route(`**/api/nodes/${NODE_ID}`, (route) => { + route.fulfill({ + status: 200, + contentType: 'application/json', + body: JSON.stringify(nodeRecord), }) }) @@ -80,24 +89,18 @@ async function mockDistributedNodes(page, { onDelete } = {}) { }) } -async function expandNodeAndWaitForBackends(page) { - await page.goto('/app/nodes') - // Click the row to expand it. The chevron toggle and the row both work, - // but clicking the name cell is the most user-like. - await page.getByText(NODE_NAME).first().click() - // Backends, Capacity and Labels live behind a "Manage"
- // disclosure (the drawer was distilled to keep at-a-glance content - // lean — see distill refactor in the multi-replica branch). Open it - // by clicking the summary inside the .node-manage scope so the - // per-node backend table is in the DOM before assertions run. - await page.locator('.node-manage > summary').first().click() +async function openNodeDetail(page) { + // The per-node backend table now lives on the deep-linkable detail page + // at /app/nodes/:id (the old expand-row + "Manage" disclosure was removed + // when the roster was restructured). Navigate straight there. + await page.goto(`/app/nodes/${NODE_ID}`) await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 }) } test.describe('Nodes page — per-node backend actions', () => { test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => { await mockDistributedNodes(page) - await expandNodeAndWaitForBackends(page) + await openNodeDetail(page) // Negative: the old, ambiguous wording must not be used. await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0) @@ -114,7 +117,7 @@ test.describe('Nodes page — per-node backend actions', () => { test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => { await mockDistributedNodes(page) - await expandNodeAndWaitForBackends(page) + await openNodeDetail(page) const deleteBtn = page.locator('button[title="Delete backend from this node"]') await expect(deleteBtn).toBeVisible() @@ -128,7 +131,7 @@ test.describe('Nodes page — per-node backend actions', () => { postedBody = route.request().postDataJSON() }, }) - await expandNodeAndWaitForBackends(page) + await openNodeDetail(page) await page.locator('button[title="Delete backend from this node"]').click() @@ -150,7 +153,7 @@ test.describe('Nodes page — per-node backend actions', () => { deleteCalls += 1 }, }) - await expandNodeAndWaitForBackends(page) + await openNodeDetail(page) await page.locator('button[title="Delete backend from this node"]').click() diff --git a/core/http/react-ui/e2e/nodes-roster.spec.js b/core/http/react-ui/e2e/nodes-roster.spec.js new file mode 100644 index 000000000..861b94441 --- /dev/null +++ b/core/http/react-ui/e2e/nodes-roster.spec.js @@ -0,0 +1,47 @@ +import { test, expect } from './coverage-fixtures.js' + +async function mockCluster(page, nodes) { + await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify(nodes) })) + await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' })) + await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' })) +} + +test.describe('Nodes roster header', () => { + test('shows a cluster pulse line and no stat-card grid', async ({ page }) => { + await mockCluster(page, [ + { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' }, + { id: 'n2', name: 'beta', node_type: 'backend', address: '10.0.0.2:50051', status: 'draining' }, + ]) + await page.goto('/app/nodes') + await expect(page.locator('.cluster-pulse')).toBeVisible({ timeout: 15_000 }) + await expect(page.locator('.cluster-pulse')).toContainText('2 nodes') + await expect(page.locator('.stat-grid')).toHaveCount(0) + }) + + test('shows an approval callout for pending nodes', async ({ page }) => { + await mockCluster(page, [{ id: 'n3', name: 'gamma', node_type: 'backend', address: '10.0.0.3:50051', status: 'pending' }]) + await page.goto('/app/nodes') + await expect(page.locator('.attention-callout')).toContainText('approval', { timeout: 15_000 }) + }) +}) + +test.describe('Nodes roster panels', () => { + test('shows model chips without clicking and filters by type', async ({ page }) => { + await page.route('**/api/nodes', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([ + { id: 'n1', name: 'alpha', node_type: 'backend', address: '10.0.0.1:50051', status: 'healthy' }, + { id: 'a1', name: 'agent-1', node_type: 'agent', address: '10.0.0.9:50051', status: 'healthy' }, + ]) })) + await page.route('**/api/nodes/models', r => r.fulfill({ status: 200, contentType: 'application/json', body: JSON.stringify([ + { node_id: 'n1', model_name: 'llama-3.3', state: 'loaded', in_flight: 2, replica_index: 0 }, + ]) })) + await page.route('**/api/nodes/scheduling', r => r.fulfill({ status: 200, contentType: 'application/json', body: '[]' })) + + await page.goto('/app/nodes') + // model chip visible without any expand click + await expect(page.locator('.node-panel').filter({ hasText: 'alpha' }).getByText('llama-3.3')).toBeVisible({ timeout: 15_000 }) + // segmented filter: Agent shows the agent node, hides the backend node + await page.getByRole('radio', { name: /Agent/ }).click() + await expect(page.getByText('agent-1')).toBeVisible() + await expect(page.getByText('alpha')).toHaveCount(0) + }) +}) diff --git a/core/http/react-ui/e2e/page-render-smoke.spec.js b/core/http/react-ui/e2e/page-render-smoke.spec.js index 40cfa1897..5f89764e0 100644 --- a/core/http/react-ui/e2e/page-render-smoke.spec.js +++ b/core/http/react-ui/e2e/page-render-smoke.spec.js @@ -21,6 +21,7 @@ const PAGES = [ ['/app/backends', 'Backends'], ['/app/settings', 'Settings'], ['/app/nodes', 'Nodes'], + ['/app/scheduling', 'Scheduling'], ['/app/face', 'Face recognition'], ['/app/voice', 'Voice recognition'], ['/app/fine-tune', 'Fine-tuning'], diff --git a/core/http/react-ui/e2e/scheduling.spec.js b/core/http/react-ui/e2e/scheduling.spec.js new file mode 100644 index 000000000..a43f11be7 --- /dev/null +++ b/core/http/react-ui/e2e/scheduling.spec.js @@ -0,0 +1,16 @@ +import { test, expect } from './coverage-fixtures.js' + +test.describe('Scheduling page', () => { + test('renders at /app/scheduling with rules from the API', async ({ page }) => { + await page.route('**/api/nodes/scheduling', (route) => { + route.fulfill({ + status: 200, contentType: 'application/json', + body: JSON.stringify([{ model_name: 'llama-3.3', spread_all: true, min_replicas: 0, max_replicas: 0 }]), + }) + }) + await page.goto('/app/scheduling') + await expect(page.locator('.page-title').first()).toBeVisible({ timeout: 15_000 }) + await expect(page).toHaveURL(/\/app\/scheduling$/) + await expect(page.getByText('llama-3.3')).toBeVisible() + }) +}) diff --git a/core/http/react-ui/public/locales/de/admin.json b/core/http/react-ui/public/locales/de/admin.json index 88582b5a2..3bf9daa68 100644 --- a/core/http/react-ui/public/locales/de/admin.json +++ b/core/http/react-ui/public/locales/de/admin.json @@ -43,6 +43,10 @@ "title": "Verteilte Knoten", "subtitle": "Backend- und Agenten-Worker-Knoten verwalten" }, + "scheduling": { + "title": "Planung", + "subtitle": "Modellplatzierung und Replikat-Regeln im gesamten Cluster" + }, "p2p": { "title": "Verteilte KI-Berechnung", "subtitle": "Skalieren Sie Ihre KI-Workloads über mehrere Geräte mit Peer-to-Peer-Verteilung" diff --git a/core/http/react-ui/public/locales/de/nav.json b/core/http/react-ui/public/locales/de/nav.json index 29f5c65d6..f2950da2d 100644 --- a/core/http/react-ui/public/locales/de/nav.json +++ b/core/http/react-ui/public/locales/de/nav.json @@ -50,6 +50,7 @@ "backends": "Backends", "traces": "Traces", "nodes": "Knoten", + "scheduling": "Planung", "swarm": "Swarm", "system": "System", "settings": "Einstellungen", diff --git a/core/http/react-ui/public/locales/en/admin.json b/core/http/react-ui/public/locales/en/admin.json index f4a380ae3..05155dd25 100644 --- a/core/http/react-ui/public/locales/en/admin.json +++ b/core/http/react-ui/public/locales/en/admin.json @@ -43,6 +43,10 @@ "title": "Distributed Nodes", "subtitle": "Manage backend and agent worker nodes" }, + "scheduling": { + "title": "Scheduling", + "subtitle": "Model placement and replica rules across the cluster" + }, "p2p": { "title": "Distributed AI Computing", "subtitle": "Scale your AI workloads across multiple devices with peer-to-peer distribution" diff --git a/core/http/react-ui/public/locales/en/chat.json b/core/http/react-ui/public/locales/en/chat.json index de9d0507d..ffda226db 100644 --- a/core/http/react-ui/public/locales/en/chat.json +++ b/core/http/react-ui/public/locales/en/chat.json @@ -86,6 +86,7 @@ "input": { "placeholder": "Message...", "attachFile": "Attach file", + "send": "Send message", "stopGenerating": "Stop generating", "canvasTitle": "Canvas — extract code blocks and media into a side panel for preview, copy, and download", "canvasLabel": "Canvas", diff --git a/core/http/react-ui/public/locales/en/home.json b/core/http/react-ui/public/locales/en/home.json index fabd9e9dd..35533a5a8 100644 --- a/core/http/react-ui/public/locales/en/home.json +++ b/core/http/react-ui/public/locales/en/home.json @@ -77,6 +77,21 @@ "noModelsTitle": "No Models Available", "noModelsBody": "There are no models installed yet. Ask your administrator to set up models so you can start chatting." }, + "starters": { + "title": "Recommended for your hardware", + "tier": { + "cpu": "CPU-only", + "gpu-small": "GPU", + "gpu-mid": "GPU", + "gpu-large": "GPU" + }, + "cpuNote": "No GPU detected — these small models stay responsive on CPU.", + "gpuNote": "Picked to fit your available VRAM with room for context.", + "install": "Install", + "installing": "Installing", + "installStarted": "Installing {{model}}…", + "installFailed": "Install failed: {{message}}" + }, "connect": { "title": "One endpoint, every API", "subtitle": "LocalAI serves its own full API — image & video generation, depth, object detection, reranking, audio, face & voice recognition, and realtime voice over WebRTC and WebSocket. On top of that, a drop-in compatibility layer lets any app built for OpenAI, Anthropic, Ollama or OpenAI Responses talk to it unchanged.", diff --git a/core/http/react-ui/public/locales/en/models.json b/core/http/react-ui/public/locales/en/models.json index 9af2d77a9..bd23d389e 100644 --- a/core/http/react-ui/public/locales/en/models.json +++ b/core/http/react-ui/public/locales/en/models.json @@ -2,6 +2,16 @@ "title": "Install Models", "subtitle": "Browse and install AI models from the gallery", "models": "Models", + "recommended": { + "title": "Recommended for your hardware", + "cpuNote": "No GPU detected - small models that stay responsive on CPU.", + "gpuNote": "Sized to fit your available VRAM with room for context.", + "install": "Install", + "installing": "Installing", + "installStarted": "Installing {{model}}…", + "installFailed": "Install failed: {{message}}", + "dismiss": "Dismiss recommendations" + }, "stats": { "available": "Available", "installed": "Installed" @@ -23,6 +33,7 @@ "tts": "TTS", "stt": "STT", "diarization": "Diarization", + "soundClassification": "Sound Tagging", "soundGen": "Sound", "audioTransform": "Audio FX", "realtimeAudio": "Realtime Audio", diff --git a/core/http/react-ui/public/locales/en/nav.json b/core/http/react-ui/public/locales/en/nav.json index 20c8e1599..5423438f9 100644 --- a/core/http/react-ui/public/locales/en/nav.json +++ b/core/http/react-ui/public/locales/en/nav.json @@ -51,6 +51,7 @@ "backends": "Backends", "traces": "Traces", "nodes": "Nodes", + "scheduling": "Scheduling", "swarm": "Swarm", "system": "System", "settings": "Settings", diff --git a/core/http/react-ui/public/locales/es/admin.json b/core/http/react-ui/public/locales/es/admin.json index fee37c1ab..1d4b61180 100644 --- a/core/http/react-ui/public/locales/es/admin.json +++ b/core/http/react-ui/public/locales/es/admin.json @@ -43,6 +43,10 @@ "title": "Nodos distribuidos", "subtitle": "Administra nodos worker de backends y agentes" }, + "scheduling": { + "title": "Planificación", + "subtitle": "Reglas de ubicación de modelos y réplicas en el clúster" + }, "p2p": { "title": "Computación de IA distribuida", "subtitle": "Escala tus cargas de trabajo de IA en múltiples dispositivos con distribución peer-to-peer" diff --git a/core/http/react-ui/public/locales/es/nav.json b/core/http/react-ui/public/locales/es/nav.json index bbb2084fe..a1ed97bca 100644 --- a/core/http/react-ui/public/locales/es/nav.json +++ b/core/http/react-ui/public/locales/es/nav.json @@ -50,6 +50,7 @@ "backends": "Backends", "traces": "Trazas", "nodes": "Nodos", + "scheduling": "Planificación", "swarm": "Swarm", "system": "Sistema", "settings": "Configuración", diff --git a/core/http/react-ui/public/locales/id/admin.json b/core/http/react-ui/public/locales/id/admin.json index 5e83eb37f..28fa5829c 100644 --- a/core/http/react-ui/public/locales/id/admin.json +++ b/core/http/react-ui/public/locales/id/admin.json @@ -43,6 +43,10 @@ "title": "Node Terdistribusi", "subtitle": "Kelola node backend dan node worker" }, + "scheduling": { + "title": "Penjadwalan", + "subtitle": "Aturan penempatan model dan replika di seluruh kluster" + }, "p2p": { "title": "Komputasi AI Terdistribusi", "subtitle": "Skalakan beban kerja AI Anda ke beberapa perangkat dengan distribusi peer-to-peer" @@ -82,4 +86,4 @@ "title": "Penjelajah", "subtitle": "Jelajahi file dan konfigurasi" } -} \ No newline at end of file +} diff --git a/core/http/react-ui/public/locales/id/chat.json b/core/http/react-ui/public/locales/id/chat.json index c79edaeb4..b9216e325 100644 --- a/core/http/react-ui/public/locales/id/chat.json +++ b/core/http/react-ui/public/locales/id/chat.json @@ -72,7 +72,7 @@ "actions": { "copy": "Salin", "regenerate": "Hasilkan ulang", - "jumpToLatest": "Jump to latest" + "jumpToLatest": "Lompat ke terbaru" }, "streaming": { "transferring": "Mentransfer model...", @@ -115,4 +115,4 @@ "clearAll": "Hapus semua", "deleteAllTitle": "Hapus semua percakapan" } -} \ No newline at end of file +} diff --git a/core/http/react-ui/public/locales/id/common.json b/core/http/react-ui/public/locales/id/common.json index 711b056df..3fc28806d 100644 --- a/core/http/react-ui/public/locales/id/common.json +++ b/core/http/react-ui/public/locales/id/common.json @@ -1,8 +1,8 @@ { "unsaved": { - "title": "Discard unsaved changes?", - "message": "You have unsaved changes that will be lost if you leave this page.", - "leave": "Leave" + "title": "Buang perubahan yang belum disimpan?", + "message": "Anda memiliki perubahan yang belum disimpan. Perubahan tersebut akan hilang jika Anda meninggalkan halaman ini.", + "leave": "Tinggalkan Halaman" }, "actions": { "save": "Simpan", diff --git a/core/http/react-ui/public/locales/id/home.json b/core/http/react-ui/public/locales/id/home.json index 368a40709..4e2aafdcb 100644 --- a/core/http/react-ui/public/locales/id/home.json +++ b/core/http/react-ui/public/locales/id/home.json @@ -7,15 +7,15 @@ "resourceGpu": "GPU", "resourceRam": "RAM", "greeting": { - "morning": "Good morning", - "afternoon": "Good afternoon", - "evening": "Good evening", - "night": "Working late" + "morning": "Selamat pagi", + "afternoon": "Selamat siang", + "evening": "Selamat malam", + "night": "Selamat lembur" }, "statusLine": { - "modelsLoaded_one": "{{count}} model loaded", - "modelsLoaded_other": "{{count}} models loaded", - "noModelsLoaded": "No models loaded", + "modelsLoaded_one": "{{count}} model dimuat", + "modelsLoaded_other": "{{count}} model dimuat", + "noModelsLoaded": "Tidak ada model yang dimuat", "nodes_one": "{{count}} node", "nodes_other": "{{count}} nodes" }, @@ -79,14 +79,14 @@ }, "connect": { "title": "Satu endpoint, semua API", - "subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Di atas itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.", + "subtitle": "LocalAI menyediakan API miliknya sendiri yang lengkap — pembuatan gambar & video, depth, deteksi objek, reranking, audio, pengenalan wajah & suara, serta suara realtime melalui WebRTC dan WebSocket. Selain itu, lapisan kompatibilitas drop-in membuat aplikasi apa pun yang dibuat untuk OpenAI, Anthropic, Ollama, atau OpenAI Responses bekerja tanpa perubahan.", "nativeTitle": "API native", "compatTitle": "Kompatibilitas drop-in", "apiReference": "Referensi API lengkap", "copy": "Salin", "copied": "Disalin", - "browse": "Browse the API", - "hide": "Hide endpoints", - "dismiss": "Dismiss" + "browse": "Jelajahi API", + "hide": "Sembunyikan endpoint", + "dismiss": "Abaikan" } } diff --git a/core/http/react-ui/public/locales/id/media.json b/core/http/react-ui/public/locales/id/media.json index b49670c63..10350967b 100644 --- a/core/http/react-ui/public/locales/id/media.json +++ b/core/http/react-ui/public/locales/id/media.json @@ -5,7 +5,7 @@ "video": "Video", "tts": "TTS", "sound": "Suara", - "transform": "Transform" + "transform": "Transformasi" } }, "image": { @@ -30,7 +30,7 @@ "refImagesAdded_other": "{{count}} gambar ditambahkan" }, "actions": { - "view": "View", + "view": "Lihat", "generate": "Hasilkan", "generating": "Menghasilkan..." }, @@ -153,4 +153,4 @@ "clearConfirm": "Hapus", "cleared": "Riwayat dihapus" } -} \ No newline at end of file +} diff --git a/core/http/react-ui/public/locales/id/nav.json b/core/http/react-ui/public/locales/id/nav.json index 34d025277..c13c197d9 100644 --- a/core/http/react-ui/public/locales/id/nav.json +++ b/core/http/react-ui/public/locales/id/nav.json @@ -19,11 +19,11 @@ "operate": "Operasikan" }, "operate": { - "inference": "Inference", - "cluster": "Cluster", - "observability": "Observability", - "access": "Access", - "system": "System" + "inference": "Inferensi", + "cluster": "Kluster", + "observability": "Observabilitas", + "access": "Akses", + "system": "Sistem" }, "items": { "home": "Beranda", @@ -51,6 +51,7 @@ "backends": "Backend", "traces": "Trace", "nodes": "Node", + "scheduling": "Penjadwalan", "swarm": "Swarm", "system": "Sistem", "settings": "Pengaturan", @@ -63,7 +64,7 @@ "copyright": "© 2023-{{year}} {{author}}" }, "console": { - "automation": "Otomasi", + "automation": "Automasi", "training": "Pelatihan" } } diff --git a/core/http/react-ui/public/locales/it/admin.json b/core/http/react-ui/public/locales/it/admin.json index 2bd575b66..323bae421 100644 --- a/core/http/react-ui/public/locales/it/admin.json +++ b/core/http/react-ui/public/locales/it/admin.json @@ -43,6 +43,10 @@ "title": "Nodi distribuiti", "subtitle": "Gestisci i nodi worker dei backend e degli agenti" }, + "scheduling": { + "title": "Pianificazione", + "subtitle": "Regole di posizionamento dei modelli e delle repliche nel cluster" + }, "p2p": { "title": "Calcolo AI distribuito", "subtitle": "Scala i tuoi carichi di lavoro AI su più dispositivi con la distribuzione peer-to-peer" diff --git a/core/http/react-ui/public/locales/it/nav.json b/core/http/react-ui/public/locales/it/nav.json index 492f4b8db..c54171f39 100644 --- a/core/http/react-ui/public/locales/it/nav.json +++ b/core/http/react-ui/public/locales/it/nav.json @@ -50,6 +50,7 @@ "backends": "Backend", "traces": "Tracce", "nodes": "Nodi", + "scheduling": "Pianificazione", "swarm": "Swarm", "system": "Sistema", "settings": "Impostazioni", diff --git a/core/http/react-ui/public/locales/ko/admin.json b/core/http/react-ui/public/locales/ko/admin.json index 726eaed65..1b6676571 100644 --- a/core/http/react-ui/public/locales/ko/admin.json +++ b/core/http/react-ui/public/locales/ko/admin.json @@ -43,6 +43,10 @@ "title": "분산 노드", "subtitle": "백엔드 및 에이전트 워커 노드를 관리합니다" }, + "scheduling": { + "title": "스케줄링", + "subtitle": "클러스터 전반의 모델 배치 및 복제본 규칙" + }, "p2p": { "title": "분산 AI 컴퓨팅", "subtitle": "피어 투 피어 분산으로 여러 기기에 걸쳐 AI 워크로드를 확장합니다" diff --git a/core/http/react-ui/public/locales/ko/nav.json b/core/http/react-ui/public/locales/ko/nav.json index 98902880d..dbd2016cc 100644 --- a/core/http/react-ui/public/locales/ko/nav.json +++ b/core/http/react-ui/public/locales/ko/nav.json @@ -51,6 +51,7 @@ "backends": "백엔드", "traces": "트레이스", "nodes": "노드", + "scheduling": "스케줄링", "swarm": "Swarm", "system": "시스템", "settings": "설정", diff --git a/core/http/react-ui/public/locales/zh-CN/admin.json b/core/http/react-ui/public/locales/zh-CN/admin.json index d55487e69..c5d9db452 100644 --- a/core/http/react-ui/public/locales/zh-CN/admin.json +++ b/core/http/react-ui/public/locales/zh-CN/admin.json @@ -43,6 +43,10 @@ "title": "分布式节点", "subtitle": "管理后端和智能体工作节点" }, + "scheduling": { + "title": "调度", + "subtitle": "集群中的模型放置和副本规则" + }, "p2p": { "title": "分布式 AI 计算", "subtitle": "通过点对点分发将您的 AI 工作负载扩展到多个设备" diff --git a/core/http/react-ui/public/locales/zh-CN/nav.json b/core/http/react-ui/public/locales/zh-CN/nav.json index 58805eec1..730791ddd 100644 --- a/core/http/react-ui/public/locales/zh-CN/nav.json +++ b/core/http/react-ui/public/locales/zh-CN/nav.json @@ -50,6 +50,7 @@ "backends": "后端", "traces": "追踪", "nodes": "节点", + "scheduling": "调度", "swarm": "Swarm", "system": "系统", "settings": "设置", diff --git a/core/http/react-ui/src/App.css b/core/http/react-ui/src/App.css index 81d225080..4578a3dd8 100644 --- a/core/http/react-ui/src/App.css +++ b/core/http/react-ui/src/App.css @@ -6363,6 +6363,130 @@ select.input { justify-content: center; } +/* ──────────────────── Home: hardware-aware starter models ──────────────────── */ + +.home-starters { + margin: var(--spacing-lg) 0; + padding: var(--spacing-lg); +} +.home-starters-head { + display: flex; + align-items: center; + justify-content: space-between; + gap: var(--spacing-md); +} +.home-starters-head strong { + font-size: 0.9375rem; +} +.home-starters-tier { + display: inline-flex; + align-items: center; + gap: var(--spacing-xs); + font-size: 0.75rem; + color: var(--color-text-muted); +} +.home-starters-sub { + margin: var(--spacing-xs) 0 var(--spacing-md); + font-size: 0.8125rem; + color: var(--color-text-secondary); +} +.home-starters-list { + list-style: none; + margin: 0; + padding: 0; + display: flex; + flex-direction: column; + gap: var(--spacing-xs); +} +.home-starters-item { + display: flex; + align-items: center; + gap: var(--spacing-md); + padding: var(--spacing-xs) 0; +} +.home-starters-name { + font-weight: 500; + font-size: 0.875rem; + word-break: break-all; +} +.home-starters-badge { + font-size: 0.625rem; +} +.home-starters-size { + margin-left: auto; + font-size: 0.75rem; + color: var(--color-text-muted); + white-space: nowrap; +} + +/* ──────────────────── Models gallery: recommended-for-your-hardware strip ──────────────────── */ + +.rec-models { + margin-bottom: var(--spacing-md); + padding: var(--spacing-md) var(--spacing-lg); +} +.rec-models-head { + display: flex; + align-items: flex-start; + justify-content: space-between; + gap: var(--spacing-md); +} +.rec-models-title { + display: flex; + align-items: center; + gap: var(--spacing-sm); + flex-wrap: wrap; +} +.rec-models-title i { + color: var(--color-primary); +} +.rec-models-note { + font-size: 0.8125rem; + color: var(--color-text-secondary); +} +.rec-models-dismiss { + background: none; + border: none; + color: var(--color-text-muted); + cursor: pointer; + padding: 4px; + flex-shrink: 0; +} +.rec-models-dismiss:hover { + color: var(--color-text-primary); +} +.rec-models-grid { + display: grid; + grid-template-columns: repeat(auto-fill, minmax(220px, 1fr)); + gap: var(--spacing-sm); + margin-top: var(--spacing-md); +} +.rec-models-item { + display: flex; + flex-direction: column; + gap: var(--spacing-xs); + padding: var(--spacing-sm) var(--spacing-md); + border: 1px solid var(--color-border-subtle); + border-radius: var(--radius-md); + background: var(--color-bg-primary); +} +.rec-models-item-name { + font-weight: 500; + font-size: 0.8125rem; + word-break: break-all; +} +.rec-models-item-meta { + display: flex; + gap: var(--spacing-sm); + font-size: 0.75rem; + color: var(--color-text-muted); +} +.rec-models-item-fit { + display: inline-flex; + align-items: center; + gap: 4px; +} + /* ──────────────────── Home: drop-in endpoint / API compatibility ──────────────────── */ .home-connect { @@ -8471,3 +8595,56 @@ select.input { .status-pill--error .status-pill__dot { background: var(--color-error); } .status-pill--info .status-pill__dot { background: var(--color-info); } .status-pill--muted .status-pill__dot { background: var(--color-text-muted); } + +/* Nodes: cluster pulse + attention callout (replaces the stat-card strip) */ +.cluster-pulse { + font-size: var(--text-sm); + color: var(--color-text-muted); + margin: 0 0 var(--spacing-lg); +} +.cluster-pulse__strong { color: var(--color-text-primary); font-weight: 600; } + +.attention-callout { + display: flex; + align-items: center; + justify-content: space-between; + gap: var(--spacing-md); + padding: var(--spacing-sm) var(--spacing-md); + border-radius: var(--radius-md); + margin-bottom: var(--spacing-lg); + font-size: var(--text-sm); +} +.attention-callout--warn { + background: var(--color-warning-light); + border: 1px solid var(--color-warning-border); + color: var(--color-text-primary); +} +.attention-callout--error { + background: var(--color-error-light); + border: 1px solid var(--color-error-border); + color: var(--color-text-primary); +} + +/* Node roster panels (Nodes page) */ +.node-roster { display: flex; flex-direction: column; gap: var(--spacing-sm); } +.node-panel { + background: var(--color-bg-secondary); + border: 1px solid var(--color-border-subtle); + border-radius: var(--radius-lg); +} +.node-panel__main { padding: var(--spacing-md) var(--spacing-lg); cursor: pointer; } +.node-panel:hover { border-color: var(--color-border); } +.node-panel__head { display: flex; align-items: flex-start; justify-content: space-between; gap: var(--spacing-md); } +.node-panel__id { display: flex; align-items: center; gap: var(--spacing-sm); flex-wrap: wrap; } +.node-panel__name { font-weight: 600; } +.node-panel__meta { display: flex; gap: var(--spacing-lg); margin-top: var(--spacing-sm); color: var(--color-text-muted); font-size: var(--text-xs); } +.node-panel__models { display: flex; flex-wrap: wrap; gap: 6px; margin-top: var(--spacing-sm); } +.model-chip { + display: inline-flex; align-items: center; gap: 5px; + font-family: var(--font-mono); font-size: 0.6875rem; + padding: 2px 8px; border-radius: var(--radius-sm); border: 1px solid; +} +.model-chip__dot { width: 6px; height: 6px; border-radius: 50%; } +.model-chip__state { opacity: 0.85; font-style: normal; } +.node-filter { margin-bottom: var(--spacing-lg); } +.node-detail__metrics { display: flex; gap: var(--spacing-xl); margin: var(--spacing-md) 0 var(--spacing-lg); flex-wrap: wrap; } diff --git a/core/http/react-ui/src/components/ActionMenu.jsx b/core/http/react-ui/src/components/ActionMenu.jsx index 5c58ecd78..55010102c 100644 --- a/core/http/react-ui/src/components/ActionMenu.jsx +++ b/core/http/react-ui/src/components/ActionMenu.jsx @@ -95,9 +95,11 @@ export default function ActionMenu({ items, ariaLabel = 'Actions', triggerLabel, className="action-menu" onKeyDown={handleMenuKeyDown} // Capture focus when the menu opens so arrow keys work without the - // user clicking inside first. + // user clicking inside first. preventScroll: the popover is portaled + // and positioned by the trigger rect, so focusing it must not scroll + // the page (that yanked the view to the top before it was placed). tabIndex={-1} - ref={el => { if (el && open) el.focus() }} + ref={el => { if (el && open) el.focus({ preventScroll: true }) }} > {visible.map((item, i) => { if (item.divider) { diff --git a/core/http/react-ui/src/components/ModelSelector.jsx b/core/http/react-ui/src/components/ModelSelector.jsx index 9009524ee..76a118ec9 100644 --- a/core/http/react-ui/src/components/ModelSelector.jsx +++ b/core/http/react-ui/src/components/ModelSelector.jsx @@ -1,8 +1,25 @@ -import { useEffect, useMemo } from 'react' +import { useEffect, useMemo, useCallback } from 'react' import { useModels } from '../hooks/useModels' import SearchableSelect from './SearchableSelect' import { useTranslation } from 'react-i18next' +// Remember the last model the user picked, keyed by capability, so returning to +// a page (Home chat box, Image, TTS, Talk...) defaults to that model instead of +// whatever happens to sort first. Only persisted when a capability key exists — +// `externalOptions` callers pass no capability and get the old first-item +// behaviour. localStorage access is wrapped because private-browsing modes throw. +const LAST_MODEL_PREFIX = 'localai_last_model:' + +function readLastModel(capability) { + if (!capability) return null + try { return localStorage.getItem(LAST_MODEL_PREFIX + capability) } catch { return null } +} + +function writeLastModel(capability, model) { + if (!capability || !model) return + try { localStorage.setItem(LAST_MODEL_PREFIX + capability, model) } catch { /* ignore */ } +} + export default function ModelSelector({ value, onChange, capability, className = '', options: externalOptions, loading: externalLoading, @@ -19,16 +36,27 @@ export default function ModelSelector({ const isLoading = externalOptions ? (externalLoading || false) : hookLoading const isDisabled = isLoading || (externalDisabled || false) + // Persist genuine selections so the next visit can restore them. + const handleChange = useCallback((next) => { + writeLastModel(capability, next) + onChange(next) + }, [capability, onChange]) + useEffect(() => { if (modelNames.length > 0 && (!value || !modelNames.includes(value))) { - onChange(modelNames[0]) + // Prefer the remembered model when it's still available; otherwise fall + // back to the first option. Don't re-persist here — auto-select is not a + // user choice, and writing back the stored value would be a harmless but + // pointless round-trip. + const remembered = readLastModel(capability) + onChange(remembered && modelNames.includes(remembered) ? remembered : modelNames[0]) } - }, [modelNames, value, onChange]) + }, [modelNames, value, onChange, capability]) return ( update(i, { min_len: parseInt(e.target.value, 10) || 0 })} + // min={0} only constrains the spinner, not keyboard entry. Clamp a + // typed negative to 0 (a negative floor is meaningless and would + // disable the length filter). When we clamp, force the DOM value + // too: the resulting 0->0 state change is a no-op, so React's + // controlled input would otherwise keep displaying the rejected + // "-5" even though the saved value is 0. + onChange={e => { + const parsed = parseInt(e.target.value, 10) + const n = Math.max(0, parsed || 0) + if (parsed < 0) e.target.value = String(n) + update(i, { min_len: n }) + }} style={{ width: 80, fontSize: '0.8125rem' }} aria-label="Minimum length" /> diff --git a/core/http/react-ui/src/components/Popover.jsx b/core/http/react-ui/src/components/Popover.jsx index 96a9e217e..7d002d348 100644 --- a/core/http/react-ui/src/components/Popover.jsx +++ b/core/http/react-ui/src/components/Popover.jsx @@ -1,10 +1,17 @@ -import { useEffect, useRef, useState, useCallback } from 'react' +import { useEffect, useLayoutEffect, useRef, useState, useCallback } from 'react' +import { createPortal } from 'react-dom' // Minimal popover: positions itself below-right of the trigger's bounding box, // flips above when there isn't room below, closes on outside click or Escape, // returns focus to the trigger. Uses the existing .card surface so it picks // up theme/border/shadow automatically — no new theming work. // +// Rendered through a portal on document.body: the popover is position:fixed and +// positioned from the trigger's viewport rect, so it must escape any ancestor +// that establishes a containing block (a row/card with a hover `transform` +// would otherwise re-anchor `position:fixed` to itself, throwing the menu to +// the wrong spot and making it unusable). +// // Props: // anchor: ref to the trigger DOMElement (required) // open: boolean @@ -30,7 +37,9 @@ export default function Popover({ anchor, open, onClose, children, ariaLabel }) setPos({ top, left: Math.max(8, left), flipped }) }, [anchor]) - useEffect(() => { + // useLayoutEffect so we measure + place the popover before the browser + // paints — otherwise it flashes at its initial {0,0} for a frame. + useLayoutEffect(() => { if (!open) return reposition() window.addEventListener('resize', reposition) @@ -65,14 +74,15 @@ export default function Popover({ anchor, open, onClose, children, ariaLabel }) if (!open && anchor?.current) { // requestAnimationFrame so the close is painted before focus jumps; // otherwise screen readers announce the trigger mid-transition. - const raf = requestAnimationFrame(() => anchor.current?.focus?.()) + // preventScroll: focusing the trigger must not yank the page scroll. + const raf = requestAnimationFrame(() => anchor.current?.focus?.({ preventScroll: true })) return () => cancelAnimationFrame(raf) } }, [open, anchor]) if (!open) return null - return ( + return createPortal(
{children} -
+ , + document.body ) } diff --git a/core/http/react-ui/src/components/RecommendedModels.jsx b/core/http/react-ui/src/components/RecommendedModels.jsx new file mode 100644 index 000000000..7620406c8 --- /dev/null +++ b/core/http/react-ui/src/components/RecommendedModels.jsx @@ -0,0 +1,86 @@ +import { useState } from 'react' +import { useTranslation } from 'react-i18next' +import { modelsApi } from '../utils/api' +import { useRecommendedModels, isNvfp4Name } from '../hooks/useRecommendedModels' + +const DISMISS_KEY = 'localai_rec_models_dismissed' + +// "Recommended for your hardware" strip at the top of the Models gallery. Shares +// the hardware-fit ranking with the empty-state starter widget via +// useRecommendedModels, but styled for the gallery page and dismissible (the +// gallery is a repeat-visit surface, so it shouldn't nag). +export default function RecommendedModels({ addToast }) { + const { t } = useTranslation('models') + const { recommended, tier, loading } = useRecommendedModels({ count: 4 }) + const [installing, setInstalling] = useState(() => new Set()) + const [dismissed, setDismissed] = useState(() => { + try { return localStorage.getItem(DISMISS_KEY) === '1' } catch { return false } + }) + + if (loading || dismissed) return null + if (!recommended || recommended.length === 0) return null + + const dismiss = () => { + try { localStorage.setItem(DISMISS_KEY, '1') } catch { /* ignore */ } + setDismissed(true) + } + + const install = async (name) => { + setInstalling(prev => new Set(prev).add(name)) + try { + await modelsApi.install(name) + addToast?.(t('recommended.installStarted', { model: name }), 'success') + } catch (err) { + addToast?.(t('recommended.installFailed', { message: err.message }), 'error') + setInstalling(prev => { + const next = new Set(prev) + next.delete(name) + return next + }) + } + } + + const isGpu = tier.id !== 'cpu' + + return ( +
+
+
+