mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-23 16:20:01 -04:00
Compare commits
53 Commits
docs/readm
...
feat/distr
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3280b9a287 | ||
|
|
375bf1929d | ||
|
|
9a7f5e68bd | ||
|
|
6b63b47f61 | ||
|
|
f4036fa83f | ||
|
|
3810fe1a1e | ||
|
|
bdfa5e934a | ||
|
|
deca6dbdad | ||
|
|
60549a8a60 | ||
|
|
54728e292f | ||
|
|
86fd62233f | ||
|
|
41ed8ced70 | ||
|
|
05e94bd9e7 | ||
|
|
8d124d080f | ||
|
|
2da1a4d230 | ||
|
|
988430c850 | ||
|
|
b336d9c626 | ||
|
|
f384c64a91 | ||
|
|
e9d8e92988 | ||
|
|
5b0196c7d0 | ||
|
|
c8d63a1003 | ||
|
|
d9cb0d6133 | ||
|
|
f5c268deac | ||
|
|
8931a2ad31 | ||
|
|
e16e758dff | ||
|
|
1c45227346 | ||
|
|
fbe4f0a99b | ||
|
|
d733c9cd13 | ||
|
|
703b4fcae8 | ||
|
|
73aacad2f9 | ||
|
|
806ea24ff4 | ||
|
|
385de3705e | ||
|
|
21eace40ec | ||
|
|
24505e57f5 | ||
|
|
d09706dc60 | ||
|
|
08e393f7db | ||
|
|
47cc3dc8d7 | ||
|
|
83b384de97 | ||
|
|
487e3fd2a4 | ||
|
|
9ab3496de2 | ||
|
|
c4511be33a | ||
|
|
551ebdb57a | ||
|
|
1d0de757c3 | ||
|
|
e5337039b0 | ||
|
|
1c9592c77f | ||
|
|
3db60b57e6 | ||
|
|
13734ae9fa | ||
|
|
c0920f3273 | ||
|
|
7c1934b183 | ||
|
|
5e062b4d1f | ||
|
|
4906cbad04 | ||
|
|
c755cd5ab5 | ||
|
|
0fb04f7ac3 |
@@ -43,7 +43,7 @@ If you add a new language bucket, `scripts/changed-backends.js` also needs a bra
|
||||
|
||||
**Additional build types you may need:**
|
||||
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
|
||||
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
|
||||
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
|
||||
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
|
||||
|
||||
## 3. Add Backend Metadata to `backend/index.yaml`
|
||||
|
||||
111
.agents/ci-caching.md
Normal file
111
.agents/ci-caching.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# CI Build Caching
|
||||
|
||||
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
|
||||
|
||||
## Cache layout
|
||||
|
||||
- **Cache registry**: `quay.io/go-skynet/ci-cache`
|
||||
- **One tag per matrix entry**, derived from the existing `tag-suffix`:
|
||||
- Backend builds (`backend_build.yml`): `cache<tag-suffix>`
|
||||
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
|
||||
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
|
||||
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
|
||||
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
|
||||
|
||||
## Read/write semantics
|
||||
|
||||
| Trigger | `cache-from` | `cache-to` |
|
||||
|---|---|---|
|
||||
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
|
||||
| `pull_request` | yes | **no** |
|
||||
|
||||
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
|
||||
|
||||
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
|
||||
|
||||
## Self-warming, no separate populator
|
||||
|
||||
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
|
||||
|
||||
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
|
||||
|
||||
## The `DEPS_REFRESH` cache-buster (Python backends)
|
||||
|
||||
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
|
||||
|
||||
```dockerfile
|
||||
ARG DEPS_REFRESH=initial
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
```
|
||||
|
||||
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
|
||||
|
||||
`DEPS_REFRESH` defends against that:
|
||||
|
||||
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
|
||||
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
|
||||
- Within a week, builds stay warm.
|
||||
|
||||
This applies only to `Dockerfile.python` because:
|
||||
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
|
||||
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
|
||||
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
|
||||
|
||||
### Adjusting the cadence
|
||||
|
||||
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
|
||||
|
||||
## Manually evicting cache
|
||||
|
||||
To force a fully cold build for one backend or the whole image:
|
||||
|
||||
```bash
|
||||
# Delete a single tag (requires quay credentials with admin on the repo)
|
||||
curl -X DELETE \
|
||||
-H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
|
||||
|
||||
# List all tags
|
||||
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
|
||||
```
|
||||
|
||||
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
|
||||
|
||||
## What the cache **does not** cover
|
||||
|
||||
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
|
||||
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
|
||||
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
|
||||
|
||||
## Darwin native caches
|
||||
|
||||
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
|
||||
|
||||
| Cache | Path(s) | Key | Scope |
|
||||
|---|---|---|---|
|
||||
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
|
||||
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
|
||||
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
|
||||
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
|
||||
|
||||
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
|
||||
|
||||
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
|
||||
|
||||
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
|
||||
|
||||
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
|
||||
|
||||
### Cache budget on Darwin
|
||||
|
||||
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
|
||||
|
||||
## Touching the cache pipeline
|
||||
|
||||
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
|
||||
|
||||
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
|
||||
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
|
||||
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
|
||||
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
|
||||
116
.github/workflows/backend.yml
vendored
116
.github/workflows/backend.yml
vendored
@@ -141,7 +141,7 @@ jobs:
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-whisperx'
|
||||
runs-on: 'ubuntu-latest'
|
||||
@@ -154,7 +154,7 @@ jobs:
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-faster-whisper'
|
||||
runs-on: 'ubuntu-latest'
|
||||
@@ -920,6 +920,32 @@ jobs:
|
||||
backend: "turboquant"
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-vllm'
|
||||
runs-on: 'arc-runner-set'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "vllm"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-vllm-omni'
|
||||
runs-on: 'arc-runner-set'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "vllm-omni"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
@@ -1076,6 +1102,45 @@ jobs:
|
||||
backend: "diffusers"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "vllm"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm-omni'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "vllm-omni"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-sglang'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "sglang"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
@@ -1671,7 +1736,7 @@ jobs:
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-rerankers'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "rerankers"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
@@ -1684,7 +1749,7 @@ jobs:
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-sycl-f32-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.llama-cpp"
|
||||
@@ -2877,6 +2942,49 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# sherpa-onnx CPU
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-sherpa-onnx'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "sherpa-onnx"
|
||||
dockerfile: "./backend/Dockerfile.golang"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# sherpa-onnx CUDA 12
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "8"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-12-sherpa-onnx'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "sherpa-onnx"
|
||||
dockerfile: "./backend/Dockerfile.golang"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# sherpa-onnx CUDA 13 — requires onnxruntime 1.24.x+ for the
|
||||
# gpu_cuda13 tarball; sherpa-onnx SHERPA_COMMIT pins to v1.12.39.
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-sherpa-onnx'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "sherpa-onnx"
|
||||
dockerfile: "./backend/Dockerfile.golang"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
backend-jobs-darwin:
|
||||
uses: ./.github/workflows/backend_build_darwin.yml
|
||||
strategy:
|
||||
|
||||
16
.github/workflows/backend_build.yml
vendored
16
.github/workflows/backend_build.yml
vendored
@@ -208,6 +208,15 @@ jobs:
|
||||
username: ${{ secrets.quayUsername }}
|
||||
password: ${{ secrets.quayPassword }}
|
||||
|
||||
# Weekly cache-buster for the per-backend `make` step. Most Python
|
||||
# backends list unpinned deps (torch, transformers, vllm, ...), so a
|
||||
# warm cache freezes upstream versions indefinitely. Rolling this
|
||||
# weekly forces a re-resolve of the install layer at most once per
|
||||
# week, picking up newer wheels without a full cold rebuild.
|
||||
- name: Compute deps refresh key
|
||||
id: deps_refresh
|
||||
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v7
|
||||
if: github.event_name != 'pull_request'
|
||||
@@ -222,9 +231,11 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ github.event_name != 'pull_request' }}
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
@@ -244,9 +255,10 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ env.quay_username != '' }}
|
||||
tags: ${{ steps.meta_pull_request.outputs.tags }}
|
||||
|
||||
131
.github/workflows/backend_build_darwin.yml
vendored
131
.github/workflows/backend_build_darwin.yml
vendored
@@ -48,6 +48,13 @@ jobs:
|
||||
strategy:
|
||||
matrix:
|
||||
go-version: ['${{ inputs.go-version }}']
|
||||
env:
|
||||
# Keep the brew Cellar stable across cache restores. Without these,
|
||||
# `brew install` would auto-update brew itself and re-link formulas,
|
||||
# mutating the very paths the cache just restored.
|
||||
HOMEBREW_NO_AUTO_UPDATE: '1'
|
||||
HOMEBREW_NO_INSTALL_CLEANUP: '1'
|
||||
HOMEBREW_NO_ANALYTICS: '1'
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
@@ -58,21 +65,141 @@ jobs:
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: ${{ matrix.go-version }}
|
||||
cache: false
|
||||
# Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
|
||||
# Shared across every darwin matrix entry — first job in a run warms
|
||||
# it, the rest hit warm.
|
||||
cache: true
|
||||
|
||||
# You can test your matrix by printing the current Go version
|
||||
- name: Display Go version
|
||||
run: go version
|
||||
|
||||
# ---- Homebrew cache ----
|
||||
# macOS runners have no Docker daemon, so the BuildKit registry cache used
|
||||
# for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
|
||||
# We cache the brew downloads + Cellar entries for the formulas we install
|
||||
# below. Read on every run, write only on master/tag pushes — same policy
|
||||
# as the Linux registry cache.
|
||||
- name: Restore Homebrew cache
|
||||
id: brew-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/Homebrew/downloads
|
||||
/opt/homebrew/Cellar/protobuf
|
||||
/opt/homebrew/Cellar/grpc
|
||||
/opt/homebrew/Cellar/protoc-gen-go
|
||||
/opt/homebrew/Cellar/protoc-gen-go-grpc
|
||||
/opt/homebrew/Cellar/libomp
|
||||
/opt/homebrew/Cellar/llvm
|
||||
/opt/homebrew/Cellar/ccache
|
||||
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
|
||||
|
||||
- name: Dependencies
|
||||
run: |
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
|
||||
# ccache is always installed (used by the llama-cpp variant build) so
|
||||
# the brew cache content stays stable across every backend in the
|
||||
# matrix — they all share one cache key.
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
|
||||
|
||||
- name: Save Homebrew cache
|
||||
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/Homebrew/downloads
|
||||
/opt/homebrew/Cellar/protobuf
|
||||
/opt/homebrew/Cellar/grpc
|
||||
/opt/homebrew/Cellar/protoc-gen-go
|
||||
/opt/homebrew/Cellar/protoc-gen-go-grpc
|
||||
/opt/homebrew/Cellar/libomp
|
||||
/opt/homebrew/Cellar/llvm
|
||||
/opt/homebrew/Cellar/ccache
|
||||
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
|
||||
|
||||
# ---- ccache for llama.cpp CMake builds ----
|
||||
# Three CMake variants (fallback, grpc, rpc-server) compile the same
|
||||
# llama.cpp source tree with overlapping flags — ccache dedupes object
|
||||
# files across them. Key on the pinned LLAMA_VERSION so a pin bump
|
||||
# invalidates cleanly; restore-keys fall back to the latest entry for the
|
||||
# same pin so unchanged TUs stay warm even when the cache is fresh.
|
||||
- name: Compute llama.cpp version
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
id: llama-version
|
||||
run: |
|
||||
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
|
||||
echo "version=${version}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Restore ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
id: ccache-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: ~/Library/Caches/ccache
|
||||
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
|
||||
restore-keys: |
|
||||
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
|
||||
|
||||
- name: Configure ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
run: |
|
||||
mkdir -p "$HOME/Library/Caches/ccache"
|
||||
ccache -M 2G
|
||||
ccache -z
|
||||
# llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
|
||||
{
|
||||
echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
|
||||
echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
|
||||
} >> "$GITHUB_ENV"
|
||||
|
||||
# ---- Python wheel cache (uv + pip) ----
|
||||
# Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
|
||||
# ISO-week segment of the cache key forces at most one cold rebuild per
|
||||
# backend per week, automatically picking up newer wheels for unpinned
|
||||
# deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
|
||||
# recent build of the same backend so off-week PRs still hit warm.
|
||||
- name: Compute weekly cache bucket
|
||||
if: inputs.lang == 'python'
|
||||
id: weekly
|
||||
run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Restore Python wheel cache
|
||||
if: inputs.lang == 'python'
|
||||
id: pyenv-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/pip
|
||||
~/Library/Caches/uv
|
||||
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
|
||||
restore-keys: |
|
||||
pyenv-darwin-${{ inputs.backend }}-
|
||||
|
||||
- name: Build ${{ inputs.backend }}-darwin
|
||||
run: |
|
||||
make protogen-go
|
||||
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
|
||||
|
||||
- name: ccache stats
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
run: ccache -s
|
||||
|
||||
- name: Save ccache
|
||||
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: ~/Library/Caches/ccache
|
||||
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
|
||||
|
||||
- name: Save Python wheel cache
|
||||
if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/pip
|
||||
~/Library/Caches/uv
|
||||
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
|
||||
|
||||
- name: Upload ${{ inputs.backend }}.tar
|
||||
uses: actions/upload-artifact@v7
|
||||
with:
|
||||
|
||||
2
.github/workflows/gallery-agent.yaml
vendored
2
.github/workflows/gallery-agent.yaml
vendored
@@ -2,7 +2,7 @@ name: Gallery Agent
|
||||
on:
|
||||
|
||||
schedule:
|
||||
- cron: '0 */3 * * *' # Run every 4 hours
|
||||
- cron: '0 */12 * * *' # Run every 4 hours
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
search_term:
|
||||
|
||||
96
.github/workflows/generate_grpc_cache.yaml
vendored
96
.github/workflows/generate_grpc_cache.yaml
vendored
@@ -1,96 +0,0 @@
|
||||
name: 'generate and publish GRPC docker caches'
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
schedule:
|
||||
# daily at midnight
|
||||
- cron: '0 0 * * *'
|
||||
|
||||
concurrency:
|
||||
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
generate_caches:
|
||||
if: github.repository == 'mudler/LocalAI'
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- grpc-base-image: ubuntu:24.04
|
||||
runs-on: 'ubuntu-latest'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
runs-on: ${{matrix.runs-on}}
|
||||
steps:
|
||||
- name: Release space from worker
|
||||
if: matrix.runs-on == 'ubuntu-latest'
|
||||
run: |
|
||||
echo "Listing top largest packages"
|
||||
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
|
||||
head -n 30 <<< "${pkgs}"
|
||||
echo
|
||||
df -h
|
||||
echo
|
||||
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
|
||||
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
|
||||
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
|
||||
sudo rm -rf /usr/local/lib/android
|
||||
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
|
||||
sudo rm -rf /usr/share/dotnet
|
||||
sudo apt-get remove -y '^mono-.*' || true
|
||||
sudo apt-get remove -y '^ghc-.*' || true
|
||||
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
|
||||
sudo apt-get remove -y 'php.*' || true
|
||||
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
|
||||
sudo apt-get remove -y '^google-.*' || true
|
||||
sudo apt-get remove -y azure-cli || true
|
||||
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
|
||||
sudo apt-get remove -y '^gfortran-.*' || true
|
||||
sudo apt-get remove -y microsoft-edge-stable || true
|
||||
sudo apt-get remove -y firefox || true
|
||||
sudo apt-get remove -y powershell || true
|
||||
sudo apt-get remove -y r-base-core || true
|
||||
sudo apt-get autoremove -y
|
||||
sudo apt-get clean
|
||||
echo
|
||||
echo "Listing top largest packages"
|
||||
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
|
||||
head -n 30 <<< "${pkgs}"
|
||||
echo
|
||||
sudo rm -rfv build || true
|
||||
sudo rm -rf /usr/share/dotnet || true
|
||||
sudo rm -rf /opt/ghc || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
|
||||
df -h
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@master
|
||||
with:
|
||||
platforms: all
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
id: buildx
|
||||
uses: docker/setup-buildx-action@master
|
||||
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v6
|
||||
|
||||
- name: Cache GRPC
|
||||
uses: docker/build-push-action@v7
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
build-args: |
|
||||
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-to: type=gha,ignore-error=true
|
||||
cache-from: type=gha
|
||||
target: grpc
|
||||
platforms: ${{ matrix.platforms }}
|
||||
push: false
|
||||
2
.github/workflows/generate_intel_image.yaml
vendored
2
.github/workflows/generate_intel_image.yaml
vendored
@@ -16,7 +16,7 @@ jobs:
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
|
||||
- base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
|
||||
runs-on: 'arc-runner-set'
|
||||
platforms: 'linux/amd64'
|
||||
runs-on: ${{matrix.runs-on}}
|
||||
|
||||
5
.github/workflows/image-pr.yml
vendored
5
.github/workflows/image-pr.yml
vendored
@@ -20,7 +20,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
secrets:
|
||||
@@ -60,15 +59,13 @@
|
||||
tag-latest: 'false'
|
||||
tag-suffix: '-hipblas'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'sycl'
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'false'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
tag-suffix: 'sycl'
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
|
||||
9
.github/workflows/image.yml
vendored
9
.github/workflows/image.yml
vendored
@@ -25,7 +25,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
ubuntu-codename: ${{ matrix.ubuntu-codename }}
|
||||
@@ -42,12 +41,11 @@
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-hipblas'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
ubuntu-version: '2404'
|
||||
ubuntu-codename: 'noble'
|
||||
|
||||
|
||||
core-image-build:
|
||||
if: github.repository == 'mudler/LocalAI'
|
||||
uses: ./.github/workflows/image_build.yml
|
||||
@@ -60,7 +58,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
@@ -121,8 +118,7 @@
|
||||
- build-type: 'intel'
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
tag-suffix: '-gpu-intel'
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
@@ -141,7 +137,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
|
||||
24
.github/workflows/image_build.yml
vendored
24
.github/workflows/image_build.yml
vendored
@@ -8,11 +8,6 @@ on:
|
||||
description: 'Base image'
|
||||
required: true
|
||||
type: string
|
||||
grpc-base-image:
|
||||
description: 'GRPC Base image, must be a compatible image with base-image'
|
||||
required: false
|
||||
default: ''
|
||||
type: string
|
||||
build-type:
|
||||
description: 'Build type'
|
||||
default: ''
|
||||
@@ -201,25 +196,19 @@ jobs:
|
||||
if: github.event_name != 'pull_request'
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
|
||||
build-args: |
|
||||
BUILD_TYPE=${{ inputs.build-type }}
|
||||
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
|
||||
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
MAKEFLAGS=${{ inputs.makeflags }}
|
||||
SKIP_DRIVERS=${{ inputs.skip-drivers }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
|
||||
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ github.event_name != 'pull_request' }}
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
@@ -230,25 +219,18 @@ jobs:
|
||||
if: github.event_name == 'pull_request'
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
|
||||
build-args: |
|
||||
BUILD_TYPE=${{ inputs.build-type }}
|
||||
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
|
||||
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
MAKEFLAGS=${{ inputs.makeflags }}
|
||||
SKIP_DRIVERS=${{ inputs.skip-drivers }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
#push: true
|
||||
tags: ${{ steps.meta_pull_request.outputs.tags }}
|
||||
|
||||
67
.github/workflows/test-extra.yml
vendored
67
.github/workflows/test-extra.yml
vendored
@@ -40,6 +40,7 @@ jobs:
|
||||
kokoros: ${{ steps.detect.outputs.kokoros }}
|
||||
insightface: ${{ steps.detect.outputs.insightface }}
|
||||
speaker-recognition: ${{ steps.detect.outputs.speaker-recognition }}
|
||||
sherpa-onnx: ${{ steps.detect.outputs.sherpa-onnx }}
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v6
|
||||
@@ -506,6 +507,72 @@ jobs:
|
||||
- name: Build llama-cpp backend image and run audio transcription gRPC e2e tests
|
||||
run: |
|
||||
make test-extra-backend-llama-cpp-transcription
|
||||
# Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked LLM.
|
||||
# Builds the sherpa-onnx Docker image, extracts the rootfs so the e2e suite
|
||||
# can discover the backend binary + shared libs, downloads the three model
|
||||
# bundles (silero-vad, omnilingual-asr, vits-ljs) and drives the realtime
|
||||
# websocket spec end-to-end.
|
||||
tests-sherpa-onnx-realtime:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.4'
|
||||
- name: Setup Node.js
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: '22'
|
||||
- name: Build sherpa-onnx backend image and run realtime e2e tests
|
||||
run: |
|
||||
make test-extra-e2e-realtime-sherpa
|
||||
# Streaming ASR via the sherpa-onnx online recognizer (zipformer
|
||||
# transducer). Exercises both AudioTranscription (buffered) and
|
||||
# AudioTranscriptionStream (real-time deltas) on the e2e-backends
|
||||
# harness.
|
||||
tests-sherpa-onnx-grpc-transcription:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.4'
|
||||
- name: Build sherpa-onnx backend image and run streaming ASR gRPC e2e tests
|
||||
run: |
|
||||
make test-extra-backend-sherpa-onnx-transcription
|
||||
# VITS TTS via the sherpa-onnx backend. Drives both TTS (file write) and
|
||||
# TTSStream (PCM chunks) on the e2e-backends harness.
|
||||
tests-sherpa-onnx-grpc-tts:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs.sherpa-onnx == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.4'
|
||||
- name: Build sherpa-onnx backend image and run TTS gRPC e2e tests
|
||||
run: |
|
||||
make test-extra-backend-sherpa-onnx-tts
|
||||
tests-ik-llama-cpp-grpc:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs.ik-llama-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
|
||||
5
.github/workflows/test.yml
vendored
5
.github/workflows/test.yml
vendored
@@ -9,9 +9,6 @@ on:
|
||||
tags:
|
||||
- '*'
|
||||
|
||||
env:
|
||||
GRPC_VERSION: v1.65.0
|
||||
|
||||
concurrency:
|
||||
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
|
||||
cancel-in-progress: true
|
||||
@@ -195,7 +192,7 @@ jobs:
|
||||
run: go version
|
||||
- name: Dependencies
|
||||
run: |
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm opus ffmpeg
|
||||
pip install --user --no-cache-dir grpcio-tools grpcio
|
||||
- name: Setup Node.js
|
||||
uses: actions/setup-node@v6
|
||||
|
||||
@@ -19,6 +19,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
|
||||
|------|-------------|
|
||||
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
|
||||
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
||||
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
|
||||
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
|
||||
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
||||
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
ARG BASE_IMAGE=ubuntu:24.04
|
||||
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
|
||||
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
|
||||
ARG UBUNTU_CODENAME=noble
|
||||
|
||||
@@ -149,6 +148,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
54
Makefile
54
Makefile
@@ -1,5 +1,5 @@
|
||||
# Disable parallel execution for backend builds
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
|
||||
|
||||
GOCMD=go
|
||||
GOTEST=$(GOCMD) test
|
||||
@@ -394,7 +394,13 @@ protoc:
|
||||
.PHONY: protogen-go
|
||||
protogen-go: protoc install-go-tools
|
||||
mkdir -p pkg/grpc/proto
|
||||
./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
|
||||
# install-go-tools writes protoc-gen-go and protoc-gen-go-grpc into
|
||||
# $(shell go env GOPATH)/bin, which isn't on every dev's PATH. protoc
|
||||
# resolves its code-gen plugins via PATH, so without this prefix the
|
||||
# generate step fails with "protoc-gen-go: program not found". Prepend
|
||||
# GOPATH/bin so the freshly-installed plugins win without requiring a
|
||||
# shell-profile change.
|
||||
PATH="$$(go env GOPATH)/bin:$$PATH" ./protoc --experimental_allow_proto3_optional -Ibackend/ --go_out=pkg/grpc/proto/ --go_opt=paths=source_relative --go-grpc_out=pkg/grpc/proto/ --go-grpc_opt=paths=source_relative \
|
||||
backend/backend.proto
|
||||
|
||||
core/config/inference_defaults.json: ## Fetch inference defaults from unsloth (only if missing)
|
||||
@@ -780,6 +786,44 @@ test-extra-backend-speaker-recognition-ecapa: docker-build-speaker-recognition
|
||||
test-extra-backend-speaker-recognition-all: \
|
||||
test-extra-backend-speaker-recognition-ecapa
|
||||
|
||||
## Realtime e2e with sherpa-onnx driving VAD + STT + TTS against a mocked
|
||||
## LLM. Extracts the sherpa-onnx Docker image rootfs, downloads the three
|
||||
## gallery-referenced model bundles (silero-vad, omnilingual-asr, vits-ljs),
|
||||
## writes the corresponding model config YAMLs, and runs the realtime
|
||||
## websocket spec in tests/e2e with REALTIME_* env vars wiring the sherpa
|
||||
## slots into the pipeline. The LLM slot stays on the in-repo mock-backend
|
||||
## registered unconditionally by tests/e2e/e2e_suite_test.go. See
|
||||
## tests/e2e/run-realtime-sherpa.sh for the full orchestration.
|
||||
test-extra-e2e-realtime-sherpa: build-mock-backend docker-build-sherpa-onnx protogen-go react-ui
|
||||
bash tests/e2e/run-realtime-sherpa.sh
|
||||
|
||||
## Streaming ASR via the sherpa-onnx online recognizer. Uses the streaming
|
||||
## zipformer English model (encoder/decoder/joiner int8 + tokens) from the
|
||||
## sherpa-onnx gallery entry. Drives both AudioTranscription and
|
||||
## AudioTranscriptionStream via the e2e-backends gRPC harness; streaming
|
||||
## emits real partial deltas during decode. Each file is renamed on download
|
||||
## to the shape sherpa-onnx's online loader expects (encoder.int8.onnx etc.).
|
||||
test-extra-backend-sherpa-onnx-transcription: docker-build-sherpa-onnx
|
||||
BACKEND_IMAGE=local-ai-backend:sherpa-onnx \
|
||||
BACKEND_TEST_MODEL_URL='https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx#encoder.int8.onnx' \
|
||||
BACKEND_TEST_EXTRA_FILES='https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx#decoder.int8.onnx|https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx#joiner.int8.onnx|https://huggingface.co/csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26/resolve/main/tokens.txt' \
|
||||
BACKEND_TEST_AUDIO_URL=https://github.com/ggml-org/whisper.cpp/raw/master/samples/jfk.wav \
|
||||
BACKEND_TEST_CAPS=health,load,transcription \
|
||||
BACKEND_TEST_OPTIONS=subtype=online \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## VITS TTS via the sherpa-onnx backend. Pulls the individual files from
|
||||
## HuggingFace (the vits-ljs release tarball lives on the k2-fsa github
|
||||
## but is also mirrored as discrete files on HF). Exercises both
|
||||
## TTS (write-to-file) and TTSStream (PCM chunks + WAV header) via the
|
||||
## e2e-backends gRPC harness.
|
||||
test-extra-backend-sherpa-onnx-tts: docker-build-sherpa-onnx
|
||||
BACKEND_IMAGE=local-ai-backend:sherpa-onnx \
|
||||
BACKEND_TEST_MODEL_URL='https://huggingface.co/csukuangfj/vits-ljs/resolve/main/vits-ljs.onnx#vits-ljs.onnx' \
|
||||
BACKEND_TEST_EXTRA_FILES='https://huggingface.co/csukuangfj/vits-ljs/resolve/main/tokens.txt|https://huggingface.co/csukuangfj/vits-ljs/resolve/main/lexicon.txt' \
|
||||
BACKEND_TEST_CAPS=health,load,tts \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## sglang mirrors the vllm setup: HuggingFace model id, same tiny Qwen,
|
||||
## tool-call extraction via sglang's native qwen parser. CPU builds use
|
||||
## sglang's upstream pyproject_cpu.toml recipe (see backend/python/sglang/install.sh).
|
||||
@@ -839,7 +883,7 @@ docker-cuda12:
|
||||
|
||||
docker-image-intel:
|
||||
docker build \
|
||||
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
|
||||
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04 \
|
||||
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
||||
--build-arg GO_TAGS="$(GO_TAGS)" \
|
||||
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
||||
@@ -917,6 +961,7 @@ BACKEND_VOXTRAL = voxtral|golang|.|false|true
|
||||
BACKEND_ACESTEP_CPP = acestep-cpp|golang|.|false|true
|
||||
BACKEND_QWEN3_TTS_CPP = qwen3-tts-cpp|golang|.|false|true
|
||||
BACKEND_OPUS = opus|golang|.|false|true
|
||||
BACKEND_SHERPA_ONNX = sherpa-onnx|golang|.|false|true
|
||||
|
||||
# Python backends with root context
|
||||
BACKEND_RERANKERS = rerankers|python|.|false|true
|
||||
@@ -1029,12 +1074,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP_QUANTIZATION)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_TINYGRAD)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_KOKOROS)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_SAM3_CPP)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
|
||||
|
||||
# Pattern rule for docker-save targets
|
||||
docker-save-%: backend-images
|
||||
docker save local-ai-backend:$* -o backend-images/$*.tar
|
||||
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
|
||||
|
||||
########################################################
|
||||
### Mock Backend for E2E Tests
|
||||
|
||||
@@ -147,6 +147,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -206,6 +206,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -162,6 +162,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
@@ -202,6 +203,13 @@ COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
|
||||
ARG FROM_SOURCE=""
|
||||
ENV FROM_SOURCE=${FROM_SOURCE}
|
||||
|
||||
# Cache-buster for the per-backend `make` step. Most Python backends list
|
||||
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
|
||||
# would otherwise freeze upstream versions indefinitely. CI passes a value
|
||||
# that rolls weekly so the install layer is rebuilt at most once per week
|
||||
# and picks up newer wheels from PyPI / nightly indexes.
|
||||
ARG DEPS_REFRESH=initial
|
||||
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
|
||||
# Package GPU libraries into the backend's lib directory
|
||||
@@ -216,4 +224,4 @@ RUN if [ -f "/${BACKEND}/package.sh" ]; then \
|
||||
|
||||
FROM scratch
|
||||
ARG BACKEND=rerankers
|
||||
COPY --from=builder /${BACKEND}/ /
|
||||
COPY --from=builder /${BACKEND}/ /
|
||||
|
||||
@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
IK_LLAMA_VERSION?=286ce324baed17c95faec77792eaa6bdb1c7a5f5
|
||||
IK_LLAMA_VERSION?=3a945af45d45936341a45bbf7deda56776a4af26
|
||||
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
--- a/examples/llava/clip.cpp
|
||||
+++ b/examples/llava/clip.cpp
|
||||
@@ -2494,7 +2494,7 @@
|
||||
}
|
||||
new_data = work.data();
|
||||
|
||||
- new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr);
|
||||
+ new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr, nullptr);
|
||||
} else {
|
||||
new_type = cur->type;
|
||||
new_data = cur->data;
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
LLAMA_VERSION?=0d0764dfd257c0ae862525c05778207f87b99b1c
|
||||
LLAMA_VERSION?=f53577432541bb9edc1588c4ef45c66bf07e4468
|
||||
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -642,6 +642,21 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
|
||||
} else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
|
||||
params.no_op_offload = false;
|
||||
}
|
||||
} else if (!strcmp(optname, "split_mode") || !strcmp(optname, "sm")) {
|
||||
// Accepts: none | layer | row | tensor (the latter requires a llama.cpp build
|
||||
// that includes ggml-org/llama.cpp#19378, FlashAttention enabled, and KV-cache
|
||||
// quantization disabled).
|
||||
if (optval != NULL) {
|
||||
if (optval_str == "none") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_NONE;
|
||||
} else if (optval_str == "layer") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
|
||||
} else if (optval_str == "row") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_ROW;
|
||||
} else if (optval_str == "tensor") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_TENSOR;
|
||||
}
|
||||
}
|
||||
} else if (!strcmp(optname, "kv_unified") || !strcmp(optname, "unified_kv")) {
|
||||
if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
|
||||
params.kv_unified = true;
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
|
||||
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
|
||||
TURBOQUANT_VERSION?=11a241d0db78a68e0a5b99fe6f36de6683100f6a
|
||||
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -4,7 +4,6 @@ package main
|
||||
// It is meant to be used by the main executable that is the server for the specific backend type (falcon, gpt3, etc)
|
||||
import (
|
||||
"container/heap"
|
||||
"errors"
|
||||
"fmt"
|
||||
"math"
|
||||
"slices"
|
||||
@@ -100,9 +99,16 @@ func sortIntoKeySlicese(keys []*pb.StoresKey) [][]float32 {
|
||||
}
|
||||
|
||||
func (s *Store) Load(opts *pb.ModelOptions) error {
|
||||
if opts.Model != "" {
|
||||
return errors.New("not implemented")
|
||||
}
|
||||
// local-store is an in-memory vector store with no on-disk artefact to
|
||||
// load — opts.Model is just a namespace identifier. The old `!= ""` guard
|
||||
// rejected any non-empty model name with "not implemented", which broke
|
||||
// callers that pass a namespace to isolate embedding spaces (face vs.
|
||||
// voice biometrics both go through local-store but need distinct stores
|
||||
// so ArcFace 512-D and ECAPA-TDNN 192-D don't collide). Namespace
|
||||
// isolation is already handled upstream: ModelLoader spawns a fresh
|
||||
// local-store process per (backend, model) tuple, so each namespace is
|
||||
// its own Store{} instance. Nothing to do here beyond accepting the load.
|
||||
_ = opts
|
||||
return nil
|
||||
}
|
||||
|
||||
|
||||
11
backend/go/sherpa-onnx/.gitignore
vendored
Normal file
11
backend/go/sherpa-onnx/.gitignore
vendored
Normal file
@@ -0,0 +1,11 @@
|
||||
.cache/
|
||||
sources/
|
||||
build*/
|
||||
package/
|
||||
backend-assets/
|
||||
sherpa-onnx
|
||||
*.so
|
||||
compile_commands.json
|
||||
sherpa-onnx-whisper-*
|
||||
vits-ljs/
|
||||
streaming-zipformer-en/
|
||||
120
backend/go/sherpa-onnx/Makefile
Normal file
120
backend/go/sherpa-onnx/Makefile
Normal file
@@ -0,0 +1,120 @@
|
||||
CURRENT_DIR=$(abspath ./)
|
||||
GOCMD=go
|
||||
|
||||
ONNX_VERSION?=1.24.4
|
||||
# v1.12.39 — includes upstream's onnxruntime 1.24.4 bump (#3501). Earlier
|
||||
# pinned commits only support onnxruntime 1.23.2, which has no CUDA 13
|
||||
# pre-built tarball, blocking the -gpu-nvidia-cuda-13 build matrix entry.
|
||||
SHERPA_COMMIT?=7288d15e3e31a7bd589b2ba88828d521e7a6b140
|
||||
ONNX_ARCH?=x64
|
||||
ONNX_OS?=linux
|
||||
|
||||
ifneq (,$(findstring aarch64,$(shell uname -m)))
|
||||
ONNX_ARCH=aarch64
|
||||
endif
|
||||
|
||||
ifeq ($(OS),Darwin)
|
||||
ONNX_OS=osx
|
||||
ifneq (,$(findstring aarch64,$(shell uname -m)))
|
||||
ONNX_ARCH=arm64
|
||||
else ifneq (,$(findstring arm64,$(shell uname -m)))
|
||||
ONNX_ARCH=arm64
|
||||
else
|
||||
ONNX_ARCH=x86_64
|
||||
endif
|
||||
endif
|
||||
|
||||
# Upstream onnxruntime ships CUDA 12 and CUDA 13 variants under different
|
||||
# names: -gpu-<ver>.tgz for CUDA 12, -gpu_cuda13-<ver>.tgz for CUDA 13
|
||||
# (note underscore vs dash). CUDA 13 tarballs only exist from 1.24.x onward.
|
||||
ifeq ($(BUILD_TYPE),cublas)
|
||||
SHERPA_GPU=ON
|
||||
ONNX_PROVIDER=cuda
|
||||
ifeq ($(CUDA_MAJOR_VERSION),13)
|
||||
ONNX_VARIANT=-gpu_cuda13
|
||||
else
|
||||
ONNX_VARIANT=-gpu
|
||||
endif
|
||||
else
|
||||
ONNX_VARIANT=
|
||||
SHERPA_GPU=OFF
|
||||
ONNX_PROVIDER=cpu
|
||||
endif
|
||||
|
||||
JOBS?=$(shell nproc --ignore=1 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)
|
||||
|
||||
sources/onnxruntime:
|
||||
mkdir -p sources/onnxruntime
|
||||
curl -L https://github.com/microsoft/onnxruntime/releases/download/v$(ONNX_VERSION)/onnxruntime-$(ONNX_OS)-$(ONNX_ARCH)$(ONNX_VARIANT)-$(ONNX_VERSION).tgz \
|
||||
-o sources/onnxruntime/onnxruntime.tgz
|
||||
cd sources/onnxruntime && tar -xf onnxruntime.tgz --strip-components=1 && rm onnxruntime.tgz
|
||||
|
||||
sources/sherpa-onnx: sources/onnxruntime
|
||||
git clone https://github.com/k2-fsa/sherpa-onnx.git sources/sherpa-onnx
|
||||
cd sources/sherpa-onnx && git checkout $(SHERPA_COMMIT)
|
||||
mkdir -p sources/sherpa-onnx/build
|
||||
# sherpa-onnx's cmake detects a pre-installed onnxruntime via the
|
||||
# SHERPA_ONNXRUNTIME_{INCLUDE,LIB}_DIR env vars (not via -D flags).
|
||||
# Point them at our locally-downloaded Microsoft tarball — without
|
||||
# this, sherpa-onnx falls through to download_onnxruntime() which
|
||||
# fetches from csukuangfj/onnxruntime-libs. For the GPU 1.24.4
|
||||
# build that release mirror publishes `-patched.zip` instead of the
|
||||
# expected `.tgz`, so the download 404s and the build fails.
|
||||
cd sources/sherpa-onnx/build && \
|
||||
SHERPA_ONNXRUNTIME_INCLUDE_DIR=$(CURRENT_DIR)/sources/onnxruntime/include \
|
||||
SHERPA_ONNXRUNTIME_LIB_DIR=$(CURRENT_DIR)/sources/onnxruntime/lib \
|
||||
cmake \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_C_FLAGS="-Wno-error=format-security" \
|
||||
-DCMAKE_CXX_FLAGS="-Wno-error=format-security" \
|
||||
-DSHERPA_ONNX_ENABLE_GPU=$(SHERPA_GPU) \
|
||||
-DSHERPA_ONNX_ENABLE_TTS=ON \
|
||||
-DSHERPA_ONNX_ENABLE_BINARY=OFF \
|
||||
-DSHERPA_ONNX_ENABLE_PYTHON=OFF \
|
||||
-DSHERPA_ONNX_ENABLE_TESTS=OFF \
|
||||
-DSHERPA_ONNX_ENABLE_C_API=ON \
|
||||
-DBUILD_SHARED_LIBS=ON \
|
||||
-DSHERPA_ONNX_USE_PRE_INSTALLED_ONNXRUNTIME_IF_AVAILABLE=ON \
|
||||
..
|
||||
cd sources/sherpa-onnx/build && make -j$(JOBS)
|
||||
|
||||
backend-assets/lib: sources/sherpa-onnx sources/onnxruntime
|
||||
mkdir -p backend-assets/lib
|
||||
cp -rfLv sources/onnxruntime/lib/* backend-assets/lib/
|
||||
cp -rfLv sources/sherpa-onnx/build/lib/*.so* backend-assets/lib/ 2>/dev/null || true
|
||||
cp -rfLv sources/sherpa-onnx/build/lib/*.dylib backend-assets/lib/ 2>/dev/null || true
|
||||
|
||||
# libsherpa-shim wraps sherpa-onnx's nested config structs and TTS
|
||||
# callback plumbing behind a purego-friendly API: opaque handles plus
|
||||
# fixed-signature setters/getters/trampoline. Plain C compile — no cgo.
|
||||
SHIM_EXT=so
|
||||
ifeq ($(OS),Darwin)
|
||||
SHIM_EXT=dylib
|
||||
endif
|
||||
|
||||
backend-assets/lib/libsherpa-shim.$(SHIM_EXT): csrc/shim.c csrc/shim.h backend-assets/lib
|
||||
$(CC) -shared -fPIC -O2 \
|
||||
-I$(CURRENT_DIR)/sources/sherpa-onnx/sherpa-onnx/c-api \
|
||||
-o $@ csrc/shim.c \
|
||||
-L$(CURRENT_DIR)/backend-assets/lib \
|
||||
-lsherpa-onnx-c-api \
|
||||
-Wl,-rpath,'$$ORIGIN'
|
||||
|
||||
sherpa-onnx: backend-assets/lib backend-assets/lib/libsherpa-shim.$(SHIM_EXT)
|
||||
CGO_ENABLED=0 $(GOCMD) build \
|
||||
-ldflags "$(LD_FLAGS) -X main.onnxProvider=$(ONNX_PROVIDER)" \
|
||||
-tags "$(GO_TAGS)" -o sherpa-onnx ./
|
||||
|
||||
package:
|
||||
bash package.sh
|
||||
|
||||
build: sherpa-onnx package
|
||||
|
||||
clean:
|
||||
rm -rf sherpa-onnx sources/ backend-assets/ package/ vits-ljs/ sherpa-onnx-whisper-*/
|
||||
|
||||
test: sherpa-onnx
|
||||
LD_LIBRARY_PATH=$(CURRENT_DIR)/backend-assets/lib \
|
||||
bash test.sh
|
||||
|
||||
.PHONY: build package clean test
|
||||
1249
backend/go/sherpa-onnx/backend.go
Normal file
1249
backend/go/sherpa-onnx/backend.go
Normal file
File diff suppressed because it is too large
Load Diff
169
backend/go/sherpa-onnx/backend_test.go
Normal file
169
backend/go/sherpa-onnx/backend_test.go
Normal file
@@ -0,0 +1,169 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
"testing"
|
||||
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
func TestSherpaBackend(t *testing.T) {
|
||||
RegisterFailHandler(Fail)
|
||||
RunSpecs(t, "Sherpa-ONNX Backend Suite")
|
||||
}
|
||||
|
||||
// Load libsherpa-shim + libsherpa-onnx-c-api via purego before any spec
|
||||
// runs — otherwise any Load/TTS/VAD/AudioTranscription call hits a nil
|
||||
// function pointer. LD_LIBRARY_PATH must contain the directory holding
|
||||
// both .so files; test.sh sets this.
|
||||
var _ = BeforeSuite(func() {
|
||||
Expect(loadSherpaLibs()).To(Succeed())
|
||||
})
|
||||
|
||||
var _ = Describe("Sherpa-ONNX", func() {
|
||||
Context("lifecycle", func() {
|
||||
It("is locking (C API is not thread safe)", func() {
|
||||
Expect((&SherpaBackend{}).Locking()).To(BeTrue())
|
||||
})
|
||||
|
||||
It("errors loading a non-existent model", func() {
|
||||
tmpDir, err := os.MkdirTemp("", "sherpa-test-nonexistent")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpDir)
|
||||
|
||||
err = (&SherpaBackend{}).Load(&pb.ModelOptions{
|
||||
ModelFile: filepath.Join(tmpDir, "non-existent-model.onnx"),
|
||||
})
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("errors loading a non-existent ASR model", func() {
|
||||
tmpDir, err := os.MkdirTemp("", "sherpa-test-asr")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpDir)
|
||||
|
||||
err = (&SherpaBackend{}).Load(&pb.ModelOptions{
|
||||
ModelFile: filepath.Join(tmpDir, "model.onnx"),
|
||||
Type: "asr",
|
||||
})
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("dispatches Load by Type", func() {
|
||||
tmpDir, err := os.MkdirTemp("", "sherpa-test-dispatch")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpDir)
|
||||
|
||||
modelFile := filepath.Join(tmpDir, "model.onnx")
|
||||
for _, typ := range []string{"", "asr", "vad"} {
|
||||
err := (&SherpaBackend{}).Load(&pb.ModelOptions{ModelFile: modelFile, Type: typ})
|
||||
Expect(err).To(HaveOccurred(), "Type=%q", typ)
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Context("method errors without loaded model", func() {
|
||||
It("rejects TTS", func() {
|
||||
tmpDir, err := os.MkdirTemp("", "sherpa-test-tts")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpDir)
|
||||
|
||||
err = (&SherpaBackend{}).TTS(&pb.TTSRequest{
|
||||
Text: "should fail — no model loaded",
|
||||
Dst: filepath.Join(tmpDir, "output.wav"),
|
||||
})
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("rejects AudioTranscription", func() {
|
||||
_, err := (&SherpaBackend{}).AudioTranscription(&pb.TranscriptRequest{
|
||||
Dst: "/tmp/nonexistent.wav",
|
||||
})
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("rejects VAD", func() {
|
||||
_, err := (&SherpaBackend{}).VAD(&pb.VADRequest{
|
||||
Audio: []float32{0.1, 0.2, 0.3},
|
||||
})
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
})
|
||||
|
||||
Context("type detection", func() {
|
||||
DescribeTable("isASRType",
|
||||
func(input string, want bool) {
|
||||
Expect(isASRType(input)).To(Equal(want))
|
||||
},
|
||||
Entry("asr", "asr", true),
|
||||
Entry("ASR", "ASR", true),
|
||||
Entry("Asr", "Asr", true),
|
||||
Entry("transcription", "transcription", true),
|
||||
Entry("Transcription", "Transcription", true),
|
||||
Entry("transcribe", "transcribe", true),
|
||||
Entry("Transcribe", "Transcribe", true),
|
||||
Entry("tts", "tts", false),
|
||||
Entry("empty", "", false),
|
||||
Entry("other", "other", false),
|
||||
Entry("vad", "vad", false),
|
||||
)
|
||||
|
||||
DescribeTable("isVADType",
|
||||
func(input string, want bool) {
|
||||
Expect(isVADType(input)).To(Equal(want))
|
||||
},
|
||||
Entry("vad", "vad", true),
|
||||
Entry("VAD", "VAD", true),
|
||||
Entry("Vad", "Vad", true),
|
||||
Entry("asr", "asr", false),
|
||||
Entry("tts", "tts", false),
|
||||
Entry("empty", "", false),
|
||||
Entry("other", "other", false),
|
||||
)
|
||||
})
|
||||
|
||||
Context("option parsing", func() {
|
||||
It("parses float options with fallback on bad input", func() {
|
||||
opts := &pb.ModelOptions{Options: []string{
|
||||
"vad.threshold=0.3",
|
||||
"tts.length_scale=1.25",
|
||||
"bad.number=not-a-float",
|
||||
}}
|
||||
Expect(findOptionFloat(opts, "vad.threshold=", 0.5)).To(BeNumerically("~", 0.3, 1e-6))
|
||||
Expect(findOptionFloat(opts, "tts.length_scale=", 1.0)).To(BeNumerically("~", 1.25, 1e-6))
|
||||
Expect(findOptionFloat(opts, "missing.key=", 0.7)).To(BeNumerically("~", 0.7, 1e-6))
|
||||
Expect(findOptionFloat(opts, "bad.number=", 9.9)).To(BeNumerically("~", 9.9, 1e-6))
|
||||
})
|
||||
|
||||
It("parses int options with fallback on bad input", func() {
|
||||
opts := &pb.ModelOptions{Options: []string{
|
||||
"asr.sample_rate=22050",
|
||||
"online.chunk_samples=800",
|
||||
"bad.int=4.2",
|
||||
}}
|
||||
Expect(findOptionInt(opts, "asr.sample_rate=", 16000)).To(Equal(int32(22050)))
|
||||
Expect(findOptionInt(opts, "online.chunk_samples=", 1600)).To(Equal(int32(800)))
|
||||
Expect(findOptionInt(opts, "missing.key=", 42)).To(Equal(int32(42)))
|
||||
Expect(findOptionInt(opts, "bad.int=", 100)).To(Equal(int32(100)))
|
||||
})
|
||||
|
||||
It("parses bool options (0/1, true/false, yes/no, on/off)", func() {
|
||||
opts := &pb.ModelOptions{Options: []string{
|
||||
"online.enable_endpoint=0",
|
||||
"asr.sense_voice.use_itn=True",
|
||||
"feature.on=yes",
|
||||
"feature.off=Off",
|
||||
"feature.bad=maybe",
|
||||
}}
|
||||
Expect(findOptionBool(opts, "online.enable_endpoint=", 1)).To(Equal(int32(0)))
|
||||
Expect(findOptionBool(opts, "asr.sense_voice.use_itn=", 0)).To(Equal(int32(1)))
|
||||
Expect(findOptionBool(opts, "feature.on=", 0)).To(Equal(int32(1)))
|
||||
Expect(findOptionBool(opts, "feature.off=", 1)).To(Equal(int32(0)))
|
||||
Expect(findOptionBool(opts, "feature.bad=", 1)).To(Equal(int32(1)))
|
||||
Expect(findOptionBool(opts, "missing.key=", 1)).To(Equal(int32(1)))
|
||||
})
|
||||
})
|
||||
})
|
||||
325
backend/go/sherpa-onnx/csrc/shim.c
Normal file
325
backend/go/sherpa-onnx/csrc/shim.c
Normal file
@@ -0,0 +1,325 @@
|
||||
#include "shim.h"
|
||||
#include "c-api.h"
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
// Replace the char* field pointed to by `slot` with a strdup of `s`
|
||||
// (or NULL if s is NULL). Frees any prior value. Silently no-ops when
|
||||
// strdup fails — the caller will see a Create* failure downstream.
|
||||
static void shim_set_str(const char **slot, const char *s) {
|
||||
free((char *)*slot);
|
||||
*slot = s ? strdup(s) : NULL;
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// VAD config
|
||||
// ==================================================================
|
||||
|
||||
void *sherpa_shim_vad_config_new(void) {
|
||||
return calloc(1, sizeof(SherpaOnnxVadModelConfig));
|
||||
}
|
||||
|
||||
void sherpa_shim_vad_config_free(void *h) {
|
||||
if (!h) return;
|
||||
SherpaOnnxVadModelConfig *c = (SherpaOnnxVadModelConfig *)h;
|
||||
free((char *)c->silero_vad.model);
|
||||
free((char *)c->provider);
|
||||
free(c);
|
||||
}
|
||||
|
||||
void sherpa_shim_vad_config_set_silero_model(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxVadModelConfig *)h)->silero_vad.model, v);
|
||||
}
|
||||
void sherpa_shim_vad_config_set_silero_threshold(void *h, float v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->silero_vad.threshold = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_silero_min_silence_duration(void *h, float v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->silero_vad.min_silence_duration = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_silero_min_speech_duration(void *h, float v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->silero_vad.min_speech_duration = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_silero_window_size(void *h, int32_t v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->silero_vad.window_size = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_silero_max_speech_duration(void *h, float v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->silero_vad.max_speech_duration = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_sample_rate(void *h, int32_t v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->sample_rate = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_num_threads(void *h, int32_t v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->num_threads = v;
|
||||
}
|
||||
void sherpa_shim_vad_config_set_provider(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxVadModelConfig *)h)->provider, v);
|
||||
}
|
||||
void sherpa_shim_vad_config_set_debug(void *h, int32_t v) {
|
||||
((SherpaOnnxVadModelConfig *)h)->debug = v;
|
||||
}
|
||||
|
||||
void *sherpa_shim_create_vad(void *h, float buffer_size_seconds) {
|
||||
return (void *)SherpaOnnxCreateVoiceActivityDetector(
|
||||
(const SherpaOnnxVadModelConfig *)h, buffer_size_seconds);
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// Offline TTS config (VITS)
|
||||
// ==================================================================
|
||||
|
||||
void *sherpa_shim_tts_config_new(void) {
|
||||
return calloc(1, sizeof(SherpaOnnxOfflineTtsConfig));
|
||||
}
|
||||
|
||||
void sherpa_shim_tts_config_free(void *h) {
|
||||
if (!h) return;
|
||||
SherpaOnnxOfflineTtsConfig *c = (SherpaOnnxOfflineTtsConfig *)h;
|
||||
free((char *)c->model.vits.model);
|
||||
free((char *)c->model.vits.tokens);
|
||||
free((char *)c->model.vits.lexicon);
|
||||
free((char *)c->model.vits.data_dir);
|
||||
free((char *)c->model.provider);
|
||||
free(c);
|
||||
}
|
||||
|
||||
void sherpa_shim_tts_config_set_vits_model(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.model, v);
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_tokens(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.tokens, v);
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_lexicon(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.lexicon, v);
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_data_dir(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.vits.data_dir, v);
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_noise_scale(void *h, float v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.noise_scale = v;
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_noise_scale_w(void *h, float v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.noise_scale_w = v;
|
||||
}
|
||||
void sherpa_shim_tts_config_set_vits_length_scale(void *h, float v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->model.vits.length_scale = v;
|
||||
}
|
||||
void sherpa_shim_tts_config_set_num_threads(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->model.num_threads = v;
|
||||
}
|
||||
void sherpa_shim_tts_config_set_debug(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->model.debug = v;
|
||||
}
|
||||
void sherpa_shim_tts_config_set_provider(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineTtsConfig *)h)->model.provider, v);
|
||||
}
|
||||
void sherpa_shim_tts_config_set_max_num_sentences(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineTtsConfig *)h)->max_num_sentences = v;
|
||||
}
|
||||
|
||||
void *sherpa_shim_create_offline_tts(void *h) {
|
||||
return (void *)SherpaOnnxCreateOfflineTts(
|
||||
(const SherpaOnnxOfflineTtsConfig *)h);
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// Offline recognizer config
|
||||
// ==================================================================
|
||||
|
||||
void *sherpa_shim_offline_recog_config_new(void) {
|
||||
return calloc(1, sizeof(SherpaOnnxOfflineRecognizerConfig));
|
||||
}
|
||||
|
||||
void sherpa_shim_offline_recog_config_free(void *h) {
|
||||
if (!h) return;
|
||||
SherpaOnnxOfflineRecognizerConfig *c = (SherpaOnnxOfflineRecognizerConfig *)h;
|
||||
free((char *)c->model_config.provider);
|
||||
free((char *)c->model_config.tokens);
|
||||
free((char *)c->model_config.whisper.encoder);
|
||||
free((char *)c->model_config.whisper.decoder);
|
||||
free((char *)c->model_config.whisper.language);
|
||||
free((char *)c->model_config.whisper.task);
|
||||
free((char *)c->model_config.paraformer.model);
|
||||
free((char *)c->model_config.sense_voice.model);
|
||||
free((char *)c->model_config.sense_voice.language);
|
||||
free((char *)c->model_config.omnilingual.model);
|
||||
free((char *)c->decoding_method);
|
||||
free(c);
|
||||
}
|
||||
|
||||
void sherpa_shim_offline_recog_config_set_num_threads(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.num_threads = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_debug(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.debug = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_provider(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.provider, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_tokens(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.tokens, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_feat_sample_rate(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->feat_config.sample_rate = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_feat_feature_dim(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->feat_config.feature_dim = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_decoding_method(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->decoding_method, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_whisper_encoder(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.encoder, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_whisper_decoder(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.decoder, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_whisper_language(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.language, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_whisper_task(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.task, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_whisper_tail_paddings(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.whisper.tail_paddings = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_paraformer_model(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.paraformer.model, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_model(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.model, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_language(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.language, v);
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_use_itn(void *h, int32_t v) {
|
||||
((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.sense_voice.use_itn = v;
|
||||
}
|
||||
void sherpa_shim_offline_recog_config_set_omnilingual_model(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOfflineRecognizerConfig *)h)->model_config.omnilingual.model, v);
|
||||
}
|
||||
|
||||
void *sherpa_shim_create_offline_recognizer(void *h) {
|
||||
return (void *)SherpaOnnxCreateOfflineRecognizer(
|
||||
(const SherpaOnnxOfflineRecognizerConfig *)h);
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// Online recognizer config
|
||||
// ==================================================================
|
||||
|
||||
void *sherpa_shim_online_recog_config_new(void) {
|
||||
return calloc(1, sizeof(SherpaOnnxOnlineRecognizerConfig));
|
||||
}
|
||||
|
||||
void sherpa_shim_online_recog_config_free(void *h) {
|
||||
if (!h) return;
|
||||
SherpaOnnxOnlineRecognizerConfig *c = (SherpaOnnxOnlineRecognizerConfig *)h;
|
||||
free((char *)c->model_config.transducer.encoder);
|
||||
free((char *)c->model_config.transducer.decoder);
|
||||
free((char *)c->model_config.transducer.joiner);
|
||||
free((char *)c->model_config.tokens);
|
||||
free((char *)c->model_config.provider);
|
||||
free((char *)c->decoding_method);
|
||||
free(c);
|
||||
}
|
||||
|
||||
void sherpa_shim_online_recog_config_set_transducer_encoder(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.encoder, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_transducer_decoder(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.decoder, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_transducer_joiner(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.transducer.joiner, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_tokens(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.tokens, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_num_threads(void *h, int32_t v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.num_threads = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_debug(void *h, int32_t v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.debug = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_provider(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->model_config.provider, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_feat_sample_rate(void *h, int32_t v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->feat_config.sample_rate = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_feat_feature_dim(void *h, int32_t v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->feat_config.feature_dim = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_decoding_method(void *h, const char *v) {
|
||||
shim_set_str(&((SherpaOnnxOnlineRecognizerConfig *)h)->decoding_method, v);
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_enable_endpoint(void *h, int32_t v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->enable_endpoint = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_rule1_min_trailing_silence(void *h, float v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->rule1_min_trailing_silence = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_rule2_min_trailing_silence(void *h, float v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->rule2_min_trailing_silence = v;
|
||||
}
|
||||
void sherpa_shim_online_recog_config_set_rule3_min_utterance_length(void *h, float v) {
|
||||
((SherpaOnnxOnlineRecognizerConfig *)h)->rule3_min_utterance_length = v;
|
||||
}
|
||||
|
||||
void *sherpa_shim_create_online_recognizer(void *h) {
|
||||
return (void *)SherpaOnnxCreateOnlineRecognizer(
|
||||
(const SherpaOnnxOnlineRecognizerConfig *)h);
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// Result-struct accessors
|
||||
// ==================================================================
|
||||
|
||||
int32_t sherpa_shim_wave_sample_rate(const void *h) {
|
||||
return ((const SherpaOnnxWave *)h)->sample_rate;
|
||||
}
|
||||
int32_t sherpa_shim_wave_num_samples(const void *h) {
|
||||
return ((const SherpaOnnxWave *)h)->num_samples;
|
||||
}
|
||||
const float *sherpa_shim_wave_samples(const void *h) {
|
||||
return ((const SherpaOnnxWave *)h)->samples;
|
||||
}
|
||||
|
||||
const char *sherpa_shim_offline_result_text(const void *h) {
|
||||
return ((const SherpaOnnxOfflineRecognizerResult *)h)->text;
|
||||
}
|
||||
const char *sherpa_shim_online_result_text(const void *h) {
|
||||
return ((const SherpaOnnxOnlineRecognizerResult *)h)->text;
|
||||
}
|
||||
|
||||
int32_t sherpa_shim_generated_audio_sample_rate(const void *h) {
|
||||
return ((const SherpaOnnxGeneratedAudio *)h)->sample_rate;
|
||||
}
|
||||
int32_t sherpa_shim_generated_audio_n(const void *h) {
|
||||
return ((const SherpaOnnxGeneratedAudio *)h)->n;
|
||||
}
|
||||
const float *sherpa_shim_generated_audio_samples(const void *h) {
|
||||
return ((const SherpaOnnxGeneratedAudio *)h)->samples;
|
||||
}
|
||||
|
||||
int32_t sherpa_shim_speech_segment_start(const void *h) {
|
||||
return ((const SherpaOnnxSpeechSegment *)h)->start;
|
||||
}
|
||||
int32_t sherpa_shim_speech_segment_n(const void *h) {
|
||||
return ((const SherpaOnnxSpeechSegment *)h)->n;
|
||||
}
|
||||
|
||||
// ==================================================================
|
||||
// TTS streaming callback trampoline
|
||||
// ==================================================================
|
||||
|
||||
void *sherpa_shim_tts_generate_with_callback(
|
||||
void *tts, const char *text, int32_t sid, float speed,
|
||||
uintptr_t callback_ptr, uintptr_t user_data) {
|
||||
SherpaOnnxGeneratedAudioCallbackWithArg cb =
|
||||
(SherpaOnnxGeneratedAudioCallbackWithArg)callback_ptr;
|
||||
return (void *)SherpaOnnxOfflineTtsGenerateWithCallbackWithArg(
|
||||
(const SherpaOnnxOfflineTts *)tts, text, sid, speed, cb,
|
||||
(void *)user_data);
|
||||
}
|
||||
129
backend/go/sherpa-onnx/csrc/shim.h
Normal file
129
backend/go/sherpa-onnx/csrc/shim.h
Normal file
@@ -0,0 +1,129 @@
|
||||
#ifndef LOCALAI_SHERPA_ONNX_SHIM_H
|
||||
#define LOCALAI_SHERPA_ONNX_SHIM_H
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
// libsherpa-shim: purego-friendly wrapper around sherpa-onnx's C API.
|
||||
// Purego can't access C struct fields and can't route C callbacks to Go
|
||||
// funcs directly. Every function here is a fixed-signature trampoline
|
||||
// that replaces one field read/write or callback handoff that the Go
|
||||
// backend would otherwise have to do through cgo.
|
||||
//
|
||||
// String lifetime: setters strdup; _free walks every owned string and
|
||||
// frees it. Callers may discard their input buffers the moment a setter
|
||||
// returns.
|
||||
//
|
||||
// Opaque handles are `void *` in both directions. Nothing here holds a
|
||||
// reference across calls except config handles (freed via _free) and
|
||||
// sherpa-allocated results (freed via sherpa's own Destroy* entry
|
||||
// points, which Go calls through purego pass-through).
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
// --- VAD config -----------------------------------------------------
|
||||
void *sherpa_shim_vad_config_new(void);
|
||||
void sherpa_shim_vad_config_free(void *cfg);
|
||||
void sherpa_shim_vad_config_set_silero_model(void *cfg, const char *path);
|
||||
void sherpa_shim_vad_config_set_silero_threshold(void *cfg, float v);
|
||||
void sherpa_shim_vad_config_set_silero_min_silence_duration(void *cfg, float v);
|
||||
void sherpa_shim_vad_config_set_silero_min_speech_duration(void *cfg, float v);
|
||||
void sherpa_shim_vad_config_set_silero_window_size(void *cfg, int32_t v);
|
||||
void sherpa_shim_vad_config_set_silero_max_speech_duration(void *cfg, float v);
|
||||
void sherpa_shim_vad_config_set_sample_rate(void *cfg, int32_t v);
|
||||
void sherpa_shim_vad_config_set_num_threads(void *cfg, int32_t v);
|
||||
void sherpa_shim_vad_config_set_provider(void *cfg, const char *v);
|
||||
void sherpa_shim_vad_config_set_debug(void *cfg, int32_t v);
|
||||
void *sherpa_shim_create_vad(void *cfg, float buffer_size_seconds);
|
||||
|
||||
// --- Offline TTS config (VITS path — the only TTS family the backend uses) ---
|
||||
void *sherpa_shim_tts_config_new(void);
|
||||
void sherpa_shim_tts_config_free(void *cfg);
|
||||
void sherpa_shim_tts_config_set_vits_model(void *cfg, const char *v);
|
||||
void sherpa_shim_tts_config_set_vits_tokens(void *cfg, const char *v);
|
||||
void sherpa_shim_tts_config_set_vits_lexicon(void *cfg, const char *v);
|
||||
void sherpa_shim_tts_config_set_vits_data_dir(void *cfg, const char *v);
|
||||
void sherpa_shim_tts_config_set_vits_noise_scale(void *cfg, float v);
|
||||
void sherpa_shim_tts_config_set_vits_noise_scale_w(void *cfg, float v);
|
||||
void sherpa_shim_tts_config_set_vits_length_scale(void *cfg, float v);
|
||||
void sherpa_shim_tts_config_set_num_threads(void *cfg, int32_t v);
|
||||
void sherpa_shim_tts_config_set_debug(void *cfg, int32_t v);
|
||||
void sherpa_shim_tts_config_set_provider(void *cfg, const char *v);
|
||||
void sherpa_shim_tts_config_set_max_num_sentences(void *cfg, int32_t v);
|
||||
void *sherpa_shim_create_offline_tts(void *cfg);
|
||||
|
||||
// --- Offline recognizer config (Whisper / Paraformer / SenseVoice / Omnilingual) ---
|
||||
void *sherpa_shim_offline_recog_config_new(void);
|
||||
void sherpa_shim_offline_recog_config_free(void *cfg);
|
||||
void sherpa_shim_offline_recog_config_set_num_threads(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_debug(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_provider(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_tokens(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_feat_sample_rate(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_feat_feature_dim(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_decoding_method(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_whisper_encoder(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_whisper_decoder(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_whisper_language(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_whisper_task(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_whisper_tail_paddings(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_paraformer_model(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_model(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_language(void *cfg, const char *v);
|
||||
void sherpa_shim_offline_recog_config_set_sense_voice_use_itn(void *cfg, int32_t v);
|
||||
void sherpa_shim_offline_recog_config_set_omnilingual_model(void *cfg, const char *v);
|
||||
void *sherpa_shim_create_offline_recognizer(void *cfg);
|
||||
|
||||
// --- Online recognizer config (streaming zipformer transducer) ---
|
||||
void *sherpa_shim_online_recog_config_new(void);
|
||||
void sherpa_shim_online_recog_config_free(void *cfg);
|
||||
void sherpa_shim_online_recog_config_set_transducer_encoder(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_transducer_decoder(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_transducer_joiner(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_tokens(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_num_threads(void *cfg, int32_t v);
|
||||
void sherpa_shim_online_recog_config_set_debug(void *cfg, int32_t v);
|
||||
void sherpa_shim_online_recog_config_set_provider(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_feat_sample_rate(void *cfg, int32_t v);
|
||||
void sherpa_shim_online_recog_config_set_feat_feature_dim(void *cfg, int32_t v);
|
||||
void sherpa_shim_online_recog_config_set_decoding_method(void *cfg, const char *v);
|
||||
void sherpa_shim_online_recog_config_set_enable_endpoint(void *cfg, int32_t v);
|
||||
void sherpa_shim_online_recog_config_set_rule1_min_trailing_silence(void *cfg, float v);
|
||||
void sherpa_shim_online_recog_config_set_rule2_min_trailing_silence(void *cfg, float v);
|
||||
void sherpa_shim_online_recog_config_set_rule3_min_utterance_length(void *cfg, float v);
|
||||
void *sherpa_shim_create_online_recognizer(void *cfg);
|
||||
|
||||
// --- Result accessors (sherpa-allocated; caller destroys via sherpa's own Destroy*) ---
|
||||
int32_t sherpa_shim_wave_sample_rate(const void *wave);
|
||||
int32_t sherpa_shim_wave_num_samples(const void *wave);
|
||||
const float *sherpa_shim_wave_samples(const void *wave);
|
||||
|
||||
const char *sherpa_shim_offline_result_text(const void *result);
|
||||
const char *sherpa_shim_online_result_text(const void *result);
|
||||
|
||||
int32_t sherpa_shim_generated_audio_sample_rate(const void *audio);
|
||||
int32_t sherpa_shim_generated_audio_n(const void *audio);
|
||||
const float *sherpa_shim_generated_audio_samples(const void *audio);
|
||||
|
||||
int32_t sherpa_shim_speech_segment_start(const void *seg);
|
||||
int32_t sherpa_shim_speech_segment_n(const void *seg);
|
||||
|
||||
// --- TTS streaming callback trampoline -----------------------------
|
||||
// Replaces the //export sherpaTtsGoCallback + callbacks.c bridge pattern.
|
||||
// `callback_ptr` is the C-callable function pointer returned by
|
||||
// purego.NewCallback. `user_data` is an integer the Go side uses to
|
||||
// look up its state (sync.Map keyed by uint64).
|
||||
//
|
||||
// Returns the sherpa-allocated SherpaOnnxGeneratedAudio. Destroy with
|
||||
// SherpaOnnxDestroyOfflineTtsGeneratedAudio (callable directly from
|
||||
// Go via purego).
|
||||
void *sherpa_shim_tts_generate_with_callback(
|
||||
void *tts, const char *text, int32_t sid, float speed,
|
||||
uintptr_t callback_ptr, uintptr_t user_data);
|
||||
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
||||
23
backend/go/sherpa-onnx/main.go
Normal file
23
backend/go/sherpa-onnx/main.go
Normal file
@@ -0,0 +1,23 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
|
||||
grpc "github.com/mudler/LocalAI/pkg/grpc"
|
||||
)
|
||||
|
||||
var (
|
||||
addr = flag.String("addr", "localhost:50051", "the address to connect to")
|
||||
)
|
||||
|
||||
func main() {
|
||||
flag.Parse()
|
||||
|
||||
if err := loadSherpaLibs(); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
|
||||
if err := grpc.StartServer(*addr, &SherpaBackend{}); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
51
backend/go/sherpa-onnx/package.sh
Executable file
51
backend/go/sherpa-onnx/package.sh
Executable file
@@ -0,0 +1,51 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath $0)")
|
||||
REPO_ROOT="${CURDIR}/../../.."
|
||||
|
||||
mkdir -p $CURDIR/package/lib
|
||||
|
||||
cp -avf $CURDIR/sherpa-onnx $CURDIR/package/
|
||||
cp -avf $CURDIR/run.sh $CURDIR/package/
|
||||
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
|
||||
|
||||
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
|
||||
echo "Detected x86_64 architecture, copying x86_64 libraries..."
|
||||
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
|
||||
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
|
||||
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
|
||||
echo "Detected ARM64 architecture, copying ARM64 libraries..."
|
||||
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
|
||||
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
|
||||
elif [ $(uname -s) = "Darwin" ]; then
|
||||
echo "Detected Darwin"
|
||||
else
|
||||
echo "Error: Could not detect architecture"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
|
||||
if [ -f "$GPU_LIB_SCRIPT" ]; then
|
||||
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
|
||||
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
|
||||
package_gpu_libs
|
||||
fi
|
||||
|
||||
echo "Packaging completed successfully"
|
||||
ls -liah $CURDIR/package/
|
||||
ls -liah $CURDIR/package/lib/
|
||||
13
backend/go/sherpa-onnx/run.sh
Executable file
13
backend/go/sherpa-onnx/run.sh
Executable file
@@ -0,0 +1,13 @@
|
||||
#!/bin/bash
|
||||
set -ex
|
||||
|
||||
CURDIR=$(dirname "$(realpath $0)")
|
||||
|
||||
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
|
||||
|
||||
if [ -f $CURDIR/lib/ld.so ]; then
|
||||
echo "Using lib/ld.so"
|
||||
exec $CURDIR/lib/ld.so $CURDIR/sherpa-onnx "$@"
|
||||
fi
|
||||
|
||||
exec $CURDIR/sherpa-onnx "$@"
|
||||
12
backend/go/sherpa-onnx/test.sh
Executable file
12
backend/go/sherpa-onnx/test.sh
Executable file
@@ -0,0 +1,12 @@
|
||||
#!/bin/bash
|
||||
# Unit tests for the sherpa-onnx backend. Exercises error-path and
|
||||
# dispatch logic via SherpaBackend directly (no gRPC). Integration
|
||||
# coverage (gRPC TTS / streaming ASR / realtime pipeline) lives in
|
||||
# tests/e2e-backends and tests/e2e and runs against the Docker image.
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath $0)")
|
||||
cd "$CURDIR"
|
||||
|
||||
PACKAGES=$(go list ./... | grep -v /sources/)
|
||||
go test -v -timeout 60s $PACKAGES
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# stablediffusion.cpp (ggml)
|
||||
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
|
||||
STABLEDIFFUSION_GGML_VERSION?=c97702e1057c2fe13a7074cd9069cb9dd6edc1bf
|
||||
STABLEDIFFUSION_GGML_VERSION?=b8bdffc19962be7e5a84bfefeb2e31bd885b571a
|
||||
|
||||
CMAKE_ARGS+=-DGGML_MAX_NAME=128
|
||||
|
||||
|
||||
@@ -139,7 +139,10 @@ func (w *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptR
|
||||
// segment start/end conversion factor taken from https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L895
|
||||
s := CppGetSegmentStart(i) * (10000000)
|
||||
t := CppGetSegmentEnd(i) * (10000000)
|
||||
txt := strings.Clone(CppGetSegmentText(i))
|
||||
// whisper.cpp can emit bytes that aren't valid UTF-8 (e.g. a multibyte
|
||||
// codepoint split across token boundaries); protobuf string fields
|
||||
// reject those at marshal time. Scrub before the value escapes cgo.
|
||||
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
|
||||
tokens := make([]int32, CppNTokens(i))
|
||||
|
||||
if opts.Diarize && CppGetSegmentSpeakerTurnNext(i) {
|
||||
|
||||
@@ -263,6 +263,8 @@
|
||||
amd: "rocm-vllm"
|
||||
intel: "intel-vllm"
|
||||
nvidia-cuda-12: "cuda12-vllm"
|
||||
nvidia-cuda-13: "cuda13-vllm"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
|
||||
cpu: "cpu-vllm"
|
||||
- &sglang
|
||||
name: "sglang"
|
||||
@@ -285,6 +287,7 @@
|
||||
amd: "rocm-sglang"
|
||||
intel: "intel-sglang"
|
||||
nvidia-cuda-12: "cuda12-sglang"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang"
|
||||
cpu: "cpu-sglang"
|
||||
- &vllm-omni
|
||||
name: "vllm-omni"
|
||||
@@ -311,6 +314,8 @@
|
||||
nvidia: "cuda12-vllm-omni"
|
||||
amd: "rocm-vllm-omni"
|
||||
nvidia-cuda-12: "cuda12-vllm-omni"
|
||||
nvidia-cuda-13: "cuda13-vllm-omni"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni"
|
||||
- &mlx
|
||||
name: "mlx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
|
||||
@@ -1006,6 +1011,23 @@
|
||||
nvidia: "cuda12-neutts"
|
||||
amd: "rocm-neutts"
|
||||
nvidia-cuda-12: "cuda12-neutts"
|
||||
- &sherpa-onnx
|
||||
name: "sherpa-onnx"
|
||||
alias: "sherpa-onnx"
|
||||
urls:
|
||||
- https://k2-fsa.github.io/sherpa/onnx/
|
||||
description: |
|
||||
Sherpa-ONNX backend for text-to-speech (VITS, Matcha, Kokoro), speech-to-text (Whisper, Paraformer, SenseVoice, Omnilingual ASR CTC), and voice activity detection via ONNX Runtime.
|
||||
Supports multi-speaker voices, 1600+ language ASR, and GPU acceleration.
|
||||
tags:
|
||||
- text-to-speech
|
||||
- TTS
|
||||
- speech-to-text
|
||||
- ASR
|
||||
capabilities:
|
||||
default: "cpu-sherpa-onnx"
|
||||
nvidia: "cuda12-sherpa-onnx"
|
||||
nvidia-cuda-12: "cuda12-sherpa-onnx"
|
||||
- !!merge <<: *neutts
|
||||
name: "neutts-development"
|
||||
capabilities:
|
||||
@@ -1591,6 +1613,20 @@
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant
|
||||
## whisper
|
||||
- !!merge <<: *whispercpp
|
||||
name: "whisper-development"
|
||||
capabilities:
|
||||
default: "cpu-whisper-development"
|
||||
nvidia: "cuda12-whisper-development"
|
||||
intel: "intel-sycl-f16-whisper-development"
|
||||
metal: "metal-whisper-development"
|
||||
amd: "rocm-whisper-development"
|
||||
vulkan: "vulkan-whisper-development"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-whisper-development"
|
||||
nvidia-cuda-13: "cuda13-whisper-development"
|
||||
nvidia-cuda-12: "cuda12-whisper-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-whisper-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-whisper-development"
|
||||
- !!merge <<: *whispercpp
|
||||
name: "nvidia-l4t-arm64-whisper"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-whisper"
|
||||
@@ -1797,12 +1833,25 @@
|
||||
nvidia: "cuda12-vllm-development"
|
||||
amd: "rocm-vllm-development"
|
||||
intel: "intel-vllm-development"
|
||||
nvidia-cuda-12: "cuda12-vllm-development"
|
||||
nvidia-cuda-13: "cuda13-vllm-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
|
||||
cpu: "cpu-vllm-development"
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda12-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "rocm-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
|
||||
@@ -1823,6 +1872,16 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "rocm-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
|
||||
@@ -1845,12 +1904,19 @@
|
||||
nvidia: "cuda12-sglang-development"
|
||||
amd: "rocm-sglang-development"
|
||||
intel: "intel-sglang-development"
|
||||
nvidia-cuda-12: "cuda12-sglang-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang-development"
|
||||
cpu: "cpu-sglang-development"
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda12-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda13-nvidia-l4t-arm64-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "rocm-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-sglang"
|
||||
@@ -1871,6 +1937,11 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda13-nvidia-l4t-arm64-sglang-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "rocm-sglang-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-sglang"
|
||||
@@ -1893,11 +1964,23 @@
|
||||
nvidia: "cuda12-vllm-omni-development"
|
||||
amd: "rocm-vllm-omni-development"
|
||||
nvidia-cuda-12: "cuda12-vllm-omni-development"
|
||||
nvidia-cuda-13: "cuda13-vllm-omni-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda12-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "rocm-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm-omni"
|
||||
@@ -1908,6 +1991,16 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "rocm-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm-omni"
|
||||
@@ -3834,3 +3927,30 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-speaker-recognition"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-speaker-recognition
|
||||
## sherpa-onnx
|
||||
- !!merge <<: *sherpa-onnx
|
||||
name: "sherpa-onnx-development"
|
||||
capabilities:
|
||||
default: "cpu-sherpa-onnx-development"
|
||||
nvidia: "cuda12-sherpa-onnx-development"
|
||||
nvidia-cuda-12: "cuda12-sherpa-onnx-development"
|
||||
- !!merge <<: *sherpa-onnx
|
||||
name: "cpu-sherpa-onnx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-sherpa-onnx"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-cpu-sherpa-onnx
|
||||
- !!merge <<: *sherpa-onnx
|
||||
name: "cpu-sherpa-onnx-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-sherpa-onnx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-cpu-sherpa-onnx
|
||||
- !!merge <<: *sherpa-onnx
|
||||
name: "cuda12-sherpa-onnx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sherpa-onnx"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sherpa-onnx
|
||||
- !!merge <<: *sherpa-onnx
|
||||
name: "cuda12-sherpa-onnx-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sherpa-onnx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-sherpa-onnx
|
||||
|
||||
@@ -173,6 +173,30 @@ def _build_antispoofer(options: dict[str, str], model_dir: str | None) -> Antisp
|
||||
|
||||
# ─── InsightFaceEngine ────────────────────────────────────────────────
|
||||
|
||||
# Canonical ONNX manifest for each upstream insightface pack (v0.7 release
|
||||
# at github.com/deepinsight/insightface/releases). LocalAI's gallery extracts
|
||||
# these zips flat into the models directory, so when multiple packs or other
|
||||
# backends drop their own ONNX files alongside, the glob-the-directory
|
||||
# approach picks up foreign files and insightface's model_zoo.get_model()
|
||||
# raises IndexError trying to index `input_shape[2]` on a tensor that isn't
|
||||
# shaped like a face model. The manifest lets us pre-filter to only the
|
||||
# files that actually belong to the requested pack — deterministic, correct
|
||||
# pack choice, no crashes on neighbour ONNX files.
|
||||
_KNOWN_PACK_MANIFESTS: dict[str, frozenset[str]] = {
|
||||
"buffalo_l": frozenset({
|
||||
"det_10g.onnx",
|
||||
"w600k_r50.onnx",
|
||||
"genderage.onnx",
|
||||
"2d106det.onnx",
|
||||
"1k3d68.onnx",
|
||||
}),
|
||||
"buffalo_sc": frozenset({
|
||||
"det_500m.onnx",
|
||||
"w600k_mbf.onnx",
|
||||
}),
|
||||
}
|
||||
|
||||
|
||||
class InsightFaceEngine:
|
||||
"""Drives insightface's model_zoo directly — no FaceAnalysis wrapper.
|
||||
|
||||
@@ -222,6 +246,21 @@ class InsightFaceEngine:
|
||||
)
|
||||
|
||||
onnx_files = sorted(glob.glob(os.path.join(pack_dir, "*.onnx")))
|
||||
# When the pack extracts flat into a shared models directory it
|
||||
# mixes with ONNX files from other backends (opencv face engine,
|
||||
# MiniFASNet antispoof, WeSpeaker voice embedding, other buffalo
|
||||
# packs installed earlier). Feeding those into model_zoo.get_model()
|
||||
# blows up inside insightface's router — it assumes a 4-D NCHW
|
||||
# input and indexes `input_shape[2]` on tensors that aren't shaped
|
||||
# like a face model, raising IndexError. For the upstream packs we
|
||||
# know the exact ONNX manifest; scoping to it makes the load
|
||||
# deterministic (without it, det_10g.onnx from buffalo_l sorts
|
||||
# before det_500m.onnx from buffalo_sc and silently wins).
|
||||
manifest = _KNOWN_PACK_MANIFESTS.get(self.model_pack)
|
||||
if manifest is not None:
|
||||
scoped = [f for f in onnx_files if os.path.basename(f) in manifest]
|
||||
if scoped:
|
||||
onnx_files = scoped
|
||||
if not onnx_files:
|
||||
raise ValueError(f"no ONNX files in pack directory: {pack_dir}")
|
||||
|
||||
@@ -231,14 +270,31 @@ class InsightFaceEngine:
|
||||
self._providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
|
||||
|
||||
self.models = {}
|
||||
skipped: list[tuple[str, str]] = []
|
||||
for onnx_file in onnx_files:
|
||||
m = model_zoo.get_model(onnx_file, providers=self._providers)
|
||||
try:
|
||||
m = model_zoo.get_model(onnx_file, providers=self._providers)
|
||||
except Exception as err:
|
||||
# Foreign ONNX (wrong rank/shape, non-insightface model) —
|
||||
# older insightface versions raise IndexError / ValueError
|
||||
# instead of returning None. Keep loading the rest.
|
||||
skipped.append((os.path.basename(onnx_file), str(err)))
|
||||
continue
|
||||
if m is None:
|
||||
skipped.append((os.path.basename(onnx_file), "unknown taskname"))
|
||||
continue
|
||||
# First occurrence of each taskname wins (matches FaceAnalysis).
|
||||
if m.taskname not in self.models:
|
||||
self.models[m.taskname] = m
|
||||
|
||||
if skipped:
|
||||
import sys
|
||||
print(
|
||||
f"[insightface] skipped {len(skipped)} non-pack ONNX file(s) in {pack_dir}: "
|
||||
+ ", ".join(f"{n} ({why})" for n, why in skipped),
|
||||
file=sys.stderr,
|
||||
)
|
||||
|
||||
if "detection" not in self.models:
|
||||
raise ValueError(f"no detector (taskname='detection') found in {pack_dir}")
|
||||
self.det_model = self.models["detection"]
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cpu]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda12]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda13]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda12]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda13]
|
||||
@@ -1 +1 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
@@ -23,6 +23,19 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
|
||||
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
|
||||
# wheel resolves cleanly. unsafe-best-match is required because the
|
||||
# jetson-ai-lab index lists transitive deps (e.g. decord) at older
|
||||
# versions only — without it uv refuses to fall through to PyPI for a
|
||||
# compatible wheel and resolution fails.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
|
||||
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
|
||||
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.
|
||||
|
||||
12
backend/python/sglang/requirements-l4t13.txt
Normal file
12
backend/python/sglang/requirements-l4t13.txt
Normal file
@@ -0,0 +1,12 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
# Drop the [all] extra: it pulls outlines/decord, and decord has no
|
||||
# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
|
||||
# only legacy cp35-cp37). With [all] uv backtracks through versions
|
||||
# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
|
||||
# so uv can't silently downgrade if a future resolution misfires.
|
||||
sglang>=0.5.0
|
||||
@@ -317,8 +317,23 @@ class OnnxDirectEngine:
|
||||
else:
|
||||
provider_list = ["CPUExecutionProvider"]
|
||||
self._session = ort.InferenceSession(onnx_path, providers=provider_list)
|
||||
self._input_name = self._session.get_inputs()[0].name
|
||||
input_meta = self._session.get_inputs()[0]
|
||||
self._input_name = input_meta.name
|
||||
# Pre-exported speaker encoders come in two shapes:
|
||||
# rank-2 [batch, samples] — some 3D-Speaker exports feed raw waveform.
|
||||
# rank-3 [batch, frames, n_mels] — WeSpeaker and most Kaldi-lineage encoders
|
||||
# expect pre-computed Kaldi FBank features.
|
||||
# We detect this at load time and branch in embed(), because feeding raw audio
|
||||
# into a rank-3 graph is exactly what triggered
|
||||
# "Invalid rank for input: feats Got: 2 Expected: 3".
|
||||
self._input_rank = len(input_meta.shape) if input_meta.shape is not None else 2
|
||||
self._expected_sr = int(options.get("sample_rate", "16000"))
|
||||
self._fbank_mels = int(options.get("fbank_num_mel_bins", "80"))
|
||||
self._fbank_frame_length_ms = float(options.get("fbank_frame_length_ms", "25"))
|
||||
self._fbank_frame_shift_ms = float(options.get("fbank_frame_shift_ms", "10"))
|
||||
# Per-utterance cepstral mean normalisation — on for WeSpeaker by default,
|
||||
# toggleable for encoders that expect raw FBank.
|
||||
self._fbank_cmn = options.get("fbank_cmn", "true").lower() in ("1", "true", "yes")
|
||||
self._analysis = AnalysisHead(options)
|
||||
|
||||
def _load_waveform(self, path: str):
|
||||
@@ -344,11 +359,37 @@ class OnnxDirectEngine:
|
||||
import numpy as np
|
||||
|
||||
audio = self._load_waveform(audio_path)
|
||||
feed = audio.reshape(1, -1)
|
||||
if self._input_rank >= 3:
|
||||
feats = self._extract_fbank(audio) # [frames, n_mels]
|
||||
feed = feats[np.newaxis, :, :] # [1, frames, n_mels]
|
||||
else:
|
||||
feed = audio.reshape(1, -1) # [1, samples]
|
||||
out = self._session.run(None, {self._input_name: feed})
|
||||
vec = np.asarray(out[0]).reshape(-1)
|
||||
return [float(x) for x in vec]
|
||||
|
||||
def _extract_fbank(self, audio):
|
||||
"""Compute Kaldi-style 80-dim FBank features for speaker encoders that
|
||||
expect pre-featurised input (WeSpeaker, most 3D-Speaker exports).
|
||||
torchaudio is already a backend dependency for SpeechBrain — no new
|
||||
package required."""
|
||||
import numpy as np
|
||||
import torch # type: ignore
|
||||
import torchaudio.compliance.kaldi as kaldi # type: ignore
|
||||
|
||||
tensor = torch.from_numpy(audio).unsqueeze(0) # [1, samples]
|
||||
feats = kaldi.fbank(
|
||||
tensor,
|
||||
sample_frequency=self._expected_sr,
|
||||
num_mel_bins=self._fbank_mels,
|
||||
frame_length=self._fbank_frame_length_ms,
|
||||
frame_shift=self._fbank_frame_shift_ms,
|
||||
dither=0.0,
|
||||
) # [frames, n_mels]
|
||||
if self._fbank_cmn:
|
||||
feats = feats - feats.mean(dim=0, keepdim=True)
|
||||
return feats.numpy().astype(np.float32)
|
||||
|
||||
def compare(self, audio1: str, audio2: str) -> float:
|
||||
return _cosine_distance(self.embed(audio1), self.embed(audio2))
|
||||
|
||||
|
||||
@@ -12,11 +12,15 @@ else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
# Handle l4t build profiles (Python 3.12, pip fallback) if needed
|
||||
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
|
||||
# unsafe-best-match is required on l4t13 because the jetson-ai-lab index
|
||||
# lists transitive deps at limited versions — without it uv pins to the
|
||||
# first matching index and fails to resolve a compatible wheel from PyPI.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
||||
@@ -26,7 +30,11 @@ fi
|
||||
# Install base requirements first
|
||||
installRequirements
|
||||
|
||||
# Install vllm based on build type
|
||||
# Install vllm based on build type. vllm-omni tracks vllm master from
|
||||
# source (cloned below) so we leave the upstream vllm dependency unpinned
|
||||
# — vllm 0.19+ ships cu130 wheels by default, which is what we want for
|
||||
# cublas13. Older cuda12/rocm/cpu paths still resolve a compatible wheel
|
||||
# from the relevant channel.
|
||||
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
|
||||
# ROCm
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
@@ -34,8 +42,26 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
|
||||
else
|
||||
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
|
||||
fi
|
||||
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
# JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel
|
||||
# at jetson-ai-lab. Version is unpinned: the index ships whatever build
|
||||
# matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through
|
||||
# to PyPI for transitive deps not present on the jetson-ai-lab index.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
else
|
||||
uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
fi
|
||||
elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
|
||||
# vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm --torch-backend=auto
|
||||
else
|
||||
uv pip install vllm --torch-backend=auto
|
||||
fi
|
||||
elif [ "x${BUILD_TYPE}" == "xcublas" ] || [ "x${BUILD_TYPE}" == "x" ]; then
|
||||
# CUDA (default) or CPU
|
||||
# cuda12 / CPU — keep the 0.14.0 pin for compatibility with the existing
|
||||
# cuda12 vllm-omni image; bumping should be its own change.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm==0.14.0 --torch-backend=auto
|
||||
else
|
||||
|
||||
5
backend/python/vllm-omni/requirements-cublas13.txt
Normal file
5
backend/python/vllm-omni/requirements-cublas13.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
accelerate
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
13
backend/python/vllm-omni/requirements-l4t13.txt
Normal file
13
backend/python/vllm-omni/requirements-l4t13.txt
Normal file
@@ -0,0 +1,13 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
bitsandbytes
|
||||
flash-attn
|
||||
diffusers
|
||||
librosa
|
||||
soundfile
|
||||
pillow
|
||||
numpy
|
||||
@@ -32,6 +32,22 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
|
||||
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
|
||||
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. unsafe-best-match
|
||||
# is required because the jetson-ai-lab index lists transitive deps at
|
||||
# limited versions — without it uv pins to the first matching index and
|
||||
# fails to resolve a compatible wheel from PyPI.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
||||
USE_PIP=true
|
||||
fi
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
|
||||
# requirements-cpu-after.txt and compiles vllm locally against the host's
|
||||
# actual CPU. Not used by default because it takes ~30-40 minutes, but
|
||||
|
||||
@@ -1,2 +1,9 @@
|
||||
https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
|
||||
# flash-attn wheels are ABI-tied to a specific torch version. vllm forces
|
||||
# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships
|
||||
# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently
|
||||
# broken when vllm upgrades torch during install, producing an undefined
|
||||
# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers
|
||||
# attention, and rotary_embedding/common.py guards the flash_attn import
|
||||
# with find_spec(), so skipping flash-attn is safe and the only stable
|
||||
# choice until upstream ships a torch-2.10 wheel.
|
||||
vllm
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
accelerate
|
||||
torch==2.7.0
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
2
backend/python/vllm/requirements-cublas13-after.txt
Normal file
2
backend/python/vllm/requirements-cublas13-after.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
vllm
|
||||
5
backend/python/vllm/requirements-cublas13.txt
Normal file
5
backend/python/vllm/requirements-cublas13.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
accelerate
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
2
backend/python/vllm/requirements-l4t13-after.txt
Normal file
2
backend/python/vllm/requirements-l4t13-after.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
vllm
|
||||
8
backend/python/vllm/requirements-l4t13.txt
Normal file
8
backend/python/vllm/requirements-l4t13.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
bitsandbytes
|
||||
flash-attn
|
||||
4
backend/rust/kokoros/Cargo.lock
generated
4
backend/rust/kokoros/Cargo.lock
generated
@@ -1867,9 +1867,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "rustls-webpki"
|
||||
version = "0.103.10"
|
||||
version = "0.103.13"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
|
||||
checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
|
||||
dependencies = [
|
||||
"ring",
|
||||
"rustls-pki-types",
|
||||
|
||||
@@ -81,18 +81,30 @@ func newApplication(appConfig *config.ApplicationConfig) *Application {
|
||||
// The resolver closes over the ModelLoader so the Registry stays
|
||||
// decoupled from loader plumbing; swapping in a postgres-backed
|
||||
// implementation later is a single construction change here.
|
||||
//
|
||||
// `faceStoreName` is the default namespace passed to StoreBackend when
|
||||
// the request doesn't override it. Face and voice MUST use distinct
|
||||
// namespaces — the local-store gRPC surface rejects mixed dimensions
|
||||
// inside one namespace ("Try to add key with length N when existing
|
||||
// length is M"). ArcFace buffalo_l produces 512-dim embeddings while
|
||||
// ECAPA-TDNN produces 192-dim; enrolling one after the other into a
|
||||
// shared namespace is exactly how we hit that error.
|
||||
const (
|
||||
faceStoreName = "localai-face-biometrics"
|
||||
voiceStoreName = "localai-voice-biometrics"
|
||||
)
|
||||
faceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
|
||||
return corebackend.StoreBackend(ml, appConfig, storeName, "")
|
||||
}
|
||||
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, "", faceEmbeddingDim)
|
||||
app.faceRegistry = facerecognition.NewStoreRegistry(faceStoreResolver, faceStoreName, faceEmbeddingDim)
|
||||
|
||||
// Voice (speaker) recognition registry — same plumbing, separate
|
||||
// registry so embedding spaces stay isolated (a face vector and a
|
||||
// speaker vector are not comparable).
|
||||
// namespace so embedding spaces stay isolated (a face vector and a
|
||||
// speaker vector are not comparable and differ in dimensionality).
|
||||
voiceStoreResolver := func(_ context.Context, storeName string) (pkggrpc.Backend, error) {
|
||||
return corebackend.StoreBackend(ml, appConfig, storeName, "")
|
||||
}
|
||||
app.voiceRegistry = voicerecognition.NewStoreRegistry(voiceStoreResolver, "", voiceEmbeddingDim)
|
||||
app.voiceRegistry = voicerecognition.NewStoreRegistry(voiceStoreResolver, voiceStoreName, voiceEmbeddingDim)
|
||||
|
||||
return app
|
||||
}
|
||||
|
||||
@@ -242,6 +242,12 @@ func New(opts ...config.AppOption) (*Application, error) {
|
||||
bmFn := func() galleryop.BackendManager { return application.GalleryService().BackendManager() }
|
||||
uc := NewUpgradeChecker(options, application.ModelLoader(), application.distributedDB(), bmFn)
|
||||
application.upgradeChecker = uc
|
||||
// Refresh the upgrade cache the moment a backend op finishes — otherwise
|
||||
// the UI keeps showing a just-upgraded backend as upgradeable until the
|
||||
// next 6-hour tick. TriggerCheck is non-blocking.
|
||||
if gs := application.GalleryService(); gs != nil {
|
||||
gs.OnBackendOpCompleted = uc.TriggerCheck
|
||||
}
|
||||
go uc.Run(options.Context)
|
||||
}
|
||||
|
||||
|
||||
@@ -11,8 +11,17 @@ func StoreBackend(sl *model.ModelLoader, appConfig *config.ApplicationConfig, st
|
||||
if backend == "" {
|
||||
backend = model.LocalStoreBackend
|
||||
}
|
||||
// ModelLoader caches backend processes by `modelID`, not by the `model`
|
||||
// passed via WithModel. Without a distinct modelID, every StoreBackend
|
||||
// call collapses to the same `modelID=""` cache slot — face (512-D) and
|
||||
// voice (192-D) biometrics would then share the same local-store process
|
||||
// and the second enrollment would fail with
|
||||
// Try to add key with length N when existing length is M
|
||||
// Use the store namespace as modelID so each namespace gets its own
|
||||
// process instance and its own in-memory Store{}.
|
||||
sc := []model.Option{
|
||||
model.WithBackendString(backend),
|
||||
model.WithModelID(storeName),
|
||||
model.WithModel(storeName),
|
||||
}
|
||||
|
||||
|
||||
@@ -90,6 +90,14 @@ type WorkerCMD struct {
|
||||
RegistrationToken string `env:"LOCALAI_REGISTRATION_TOKEN" help:"Token for authenticating with the frontend" group:"registration"`
|
||||
HeartbeatInterval string `env:"LOCALAI_HEARTBEAT_INTERVAL" default:"10s" help:"Interval between heartbeats" group:"registration"`
|
||||
NodeLabels string `env:"LOCALAI_NODE_LABELS" help:"Comma-separated key=value labels for this node (e.g. tier=fast,gpu=a100)" group:"registration"`
|
||||
// MaxReplicasPerModel caps how many replicas of any one model can run on
|
||||
// this worker concurrently. Default 1 = historical single-replica
|
||||
// behavior. Set higher when a node has enough VRAM to host multiple
|
||||
// copies of the same model (e.g. a fat 128 GiB box running 4× of a
|
||||
// 24 GiB model for throughput). The auto-label `node.replica-slots=N`
|
||||
// is published so model schedulers can target high-capacity nodes via
|
||||
// the existing label selector.
|
||||
MaxReplicasPerModel int `env:"LOCALAI_MAX_REPLICAS_PER_MODEL" default:"1" help:"Max replicas of any single model on this worker. Default 1 preserves single-replica behavior; set higher to allow stacking replicas on a fat node." group:"registration"`
|
||||
|
||||
// NATS (required)
|
||||
NatsURL string `env:"LOCALAI_NATS_URL" required:"" help:"NATS server URL" group:"distributed"`
|
||||
@@ -567,22 +575,35 @@ func (s *backendSupervisor) getAddr(backend string) string {
|
||||
return ""
|
||||
}
|
||||
|
||||
// buildProcessKey is the supervisor's stable identifier for a backend gRPC
|
||||
// process. It includes the replica index so the same model can run multiple
|
||||
// processes on a worker simultaneously without colliding on the same map slot
|
||||
// or port. The "#N" suffix is purely internal — the controller never reads it.
|
||||
func buildProcessKey(modelID, backend string, replicaIndex int) string {
|
||||
base := modelID
|
||||
if base == "" {
|
||||
base = backend
|
||||
}
|
||||
return fmt.Sprintf("%s#%d", base, replicaIndex)
|
||||
}
|
||||
|
||||
// installBackend handles the backend.install flow:
|
||||
// 1. If already running for this model, return existing address
|
||||
// 1. If already running for this (model, replica) slot, return existing address
|
||||
// 2. Install backend from gallery (if not already installed)
|
||||
// 3. Find backend binary
|
||||
// 4. Start gRPC process on a new port
|
||||
// Returns the gRPC address of the backend process.
|
||||
//
|
||||
// ProcessKey includes the replica index so a worker with MaxReplicasPerModel>1
|
||||
// can host multiple processes for the same model on distinct ports. Old
|
||||
// controllers (no replica_index in the request) implicitly target replica 0,
|
||||
// which preserves single-replica behavior.
|
||||
func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest) (string, error) {
|
||||
// Process key: use ModelID if provided (per-model process), else backend name
|
||||
processKey := req.ModelID
|
||||
if processKey == "" {
|
||||
processKey = req.Backend
|
||||
}
|
||||
processKey := buildProcessKey(req.ModelID, req.Backend, int(req.ReplicaIndex))
|
||||
|
||||
// If already running for this model, return its address
|
||||
// If already running for this model+replica, return its address
|
||||
if addr := s.getAddr(processKey); addr != "" {
|
||||
xlog.Info("Backend already running for model", "backend", req.Backend, "model", req.ModelID, "addr", addr)
|
||||
xlog.Info("Backend already running for model replica", "backend", req.Backend, "model", req.ModelID, "replica", req.ReplicaIndex, "addr", addr)
|
||||
return addr, nil
|
||||
}
|
||||
|
||||
@@ -886,13 +907,18 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
|
||||
totalVRAM, _ := xsysinfo.TotalAvailableVRAM()
|
||||
gpuVendor, _ := xsysinfo.DetectGPUVendor()
|
||||
|
||||
maxReplicas := cmd.MaxReplicasPerModel
|
||||
if maxReplicas < 1 {
|
||||
maxReplicas = 1
|
||||
}
|
||||
body := map[string]any{
|
||||
"name": nodeName,
|
||||
"address": cmd.advertiseAddr(),
|
||||
"http_address": cmd.advertiseHTTPAddr(),
|
||||
"total_vram": totalVRAM,
|
||||
"available_vram": totalVRAM, // initially all VRAM is available
|
||||
"gpu_vendor": gpuVendor,
|
||||
"name": nodeName,
|
||||
"address": cmd.advertiseAddr(),
|
||||
"http_address": cmd.advertiseHTTPAddr(),
|
||||
"total_vram": totalVRAM,
|
||||
"available_vram": totalVRAM, // initially all VRAM is available
|
||||
"gpu_vendor": gpuVendor,
|
||||
"max_replicas_per_model": maxReplicas,
|
||||
}
|
||||
|
||||
// If no GPU detected, report system RAM so the scheduler/UI has capacity info
|
||||
@@ -906,39 +932,40 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
|
||||
body["token"] = cmd.RegistrationToken
|
||||
}
|
||||
|
||||
// Parse and add static node labels
|
||||
// Parse and add static node labels. Always include the auto-label
|
||||
// `node.replica-slots=N` so AND-selectors in ModelSchedulingConfig can
|
||||
// target high-capacity nodes (e.g. {"node.replica-slots":"4"}).
|
||||
labels := make(map[string]string)
|
||||
if cmd.NodeLabels != "" {
|
||||
labels := make(map[string]string)
|
||||
for _, pair := range strings.Split(cmd.NodeLabels, ",") {
|
||||
pair = strings.TrimSpace(pair)
|
||||
if k, v, ok := strings.Cut(pair, "="); ok {
|
||||
labels[strings.TrimSpace(k)] = strings.TrimSpace(v)
|
||||
}
|
||||
}
|
||||
if len(labels) > 0 {
|
||||
body["labels"] = labels
|
||||
}
|
||||
}
|
||||
labels["node.replica-slots"] = strconv.Itoa(maxReplicas)
|
||||
body["labels"] = labels
|
||||
|
||||
return body
|
||||
}
|
||||
|
||||
// heartbeatBody returns the current VRAM/RAM stats for heartbeat payloads.
|
||||
//
|
||||
// When aggregate VRAM usage is unknown (no GPU, or temporary detection
|
||||
// failure), we deliberately OMIT available_vram so the frontend keeps its
|
||||
// last good value — overwriting with 0 makes the UI show the node as "fully
|
||||
// used", while reporting total-as-available lies to the scheduler about
|
||||
// free capacity.
|
||||
func (cmd *WorkerCMD) heartbeatBody() map[string]any {
|
||||
var availVRAM uint64
|
||||
body := map[string]any{}
|
||||
aggregate := xsysinfo.GetGPUAggregateInfo()
|
||||
if aggregate.TotalVRAM > 0 {
|
||||
availVRAM = aggregate.FreeVRAM
|
||||
} else {
|
||||
// Fallback: report total as available (no usage tracking possible)
|
||||
availVRAM, _ = xsysinfo.TotalAvailableVRAM()
|
||||
body["available_vram"] = aggregate.FreeVRAM
|
||||
}
|
||||
|
||||
body := map[string]any{
|
||||
"available_vram": availVRAM,
|
||||
}
|
||||
|
||||
// If no GPU, report system RAM usage instead
|
||||
// CPU-only workers (or workers that lost GPU visibility momentarily):
|
||||
// report system RAM so the scheduler still has capacity info.
|
||||
if aggregate.TotalVRAM == 0 {
|
||||
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
|
||||
body["available_ram"] = ramInfo.Available
|
||||
|
||||
70
core/cli/worker_replica_test.go
Normal file
70
core/cli/worker_replica_test.go
Normal file
@@ -0,0 +1,70 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("Worker per-replica process keying", func() {
|
||||
Describe("buildProcessKey", func() {
|
||||
// Pin the supervisor's keying contract: distinct replica indexes for
|
||||
// the same modelID produce distinct process keys, so the supervisor
|
||||
// map can hold multiple processes for one model. Dropping the suffix
|
||||
// would re-introduce the original flap (one model, one slot, churn).
|
||||
DescribeTable("produces stable, distinct keys",
|
||||
func(modelID, backend string, replica int, want string) {
|
||||
Expect(buildProcessKey(modelID, backend, replica)).To(Equal(want))
|
||||
},
|
||||
Entry("modelID present, replica 0", "Qwen3-35B", "llama-cpp", 0, "Qwen3-35B#0"),
|
||||
Entry("modelID present, replica 1", "Qwen3-35B", "llama-cpp", 1, "Qwen3-35B#1"),
|
||||
Entry("falls back to backend when modelID empty", "", "llama-cpp", 0, "llama-cpp#0"),
|
||||
Entry("backend fallback with replica 2", "", "llama-cpp", 2, "llama-cpp#2"),
|
||||
)
|
||||
|
||||
It("makes replicas distinguishable", func() {
|
||||
r0 := buildProcessKey("model-a", "llama-cpp", 0)
|
||||
r1 := buildProcessKey("model-a", "llama-cpp", 1)
|
||||
Expect(r0).ToNot(Equal(r1), "replicas of the same model must produce distinct keys")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("registrationBody", func() {
|
||||
It("includes max_replicas_per_model and the auto-label", func() {
|
||||
cmd := &WorkerCMD{
|
||||
Addr: "worker.example.com:50051",
|
||||
MaxReplicasPerModel: 4,
|
||||
}
|
||||
body := cmd.registrationBody()
|
||||
|
||||
Expect(body).To(HaveKey("max_replicas_per_model"))
|
||||
Expect(body["max_replicas_per_model"]).To(Equal(4))
|
||||
|
||||
labels, ok := body["labels"].(map[string]string)
|
||||
Expect(ok).To(BeTrue(), "labels must be present so selectors can target the slot count")
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "4"))
|
||||
})
|
||||
|
||||
It("coerces zero/unset MaxReplicasPerModel to 1", func() {
|
||||
cmd := &WorkerCMD{Addr: "worker.example.com:50051"}
|
||||
body := cmd.registrationBody()
|
||||
Expect(body["max_replicas_per_model"]).To(Equal(1),
|
||||
"unset must default to single-replica behavior, not capacity 0")
|
||||
|
||||
labels := body["labels"].(map[string]string)
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "1"))
|
||||
})
|
||||
|
||||
It("preserves user-provided labels alongside the auto-label", func() {
|
||||
cmd := &WorkerCMD{
|
||||
Addr: "worker.example.com:50051",
|
||||
MaxReplicasPerModel: 2,
|
||||
NodeLabels: "tier=fast,gpu=a100",
|
||||
}
|
||||
body := cmd.registrationBody()
|
||||
labels := body["labels"].(map[string]string)
|
||||
Expect(labels).To(HaveKeyWithValue("tier", "fast"))
|
||||
Expect(labels).To(HaveKeyWithValue("gpu", "a100"))
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "2"))
|
||||
})
|
||||
})
|
||||
})
|
||||
@@ -767,7 +767,7 @@ func (c *ModelConfig) GuessUsecases(u ModelConfigUsecase) bool {
|
||||
}
|
||||
|
||||
if (u & FLAG_VAD) == FLAG_VAD {
|
||||
if c.Backend != "silero-vad" && !(c.Backend == "whisper" && slices.Contains(c.Options, "vad_only")) {
|
||||
if c.Backend != "silero-vad" && c.Backend != "sherpa-onnx" && !(c.Backend == "whisper" && slices.Contains(c.Options, "vad_only")) {
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
@@ -194,6 +194,20 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
|
||||
|
||||
name := config.Name
|
||||
backendPath := filepath.Join(systemState.Backend.BackendsPath, name)
|
||||
// Clean up legacy flat-layout artefacts: earlier dev builds of the
|
||||
// golang backends dropped the compiled binary directly at
|
||||
// `<backendsPath>/<name>` (a plain file) instead of
|
||||
// `<backendsPath>/<name>/<name>` (the nested layout the current code
|
||||
// expects). MkdirAll below returns ENOTDIR when such a stale file
|
||||
// exists, permanently blocking any reinstall or upgrade. Remove the
|
||||
// file first so the install can proceed; the new install will write
|
||||
// the correct nested layout, including metadata.json + run.sh.
|
||||
if fi, statErr := os.Lstat(backendPath); statErr == nil && !fi.IsDir() {
|
||||
xlog.Warn("removing stale non-directory backend artefact to make room for fresh install", "path", backendPath)
|
||||
if rmErr := os.Remove(backendPath); rmErr != nil {
|
||||
return fmt.Errorf("failed to remove stale backend artefact at %s: %w", backendPath, rmErr)
|
||||
}
|
||||
}
|
||||
err = os.MkdirAll(backendPath, 0750)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create base path: %v", err)
|
||||
|
||||
@@ -880,7 +880,7 @@ func convertAnthropicTools(input *schema.AnthropicRequest, cfg *config.ModelConf
|
||||
if tcType, ok := tc["type"].(string); ok && tcType == "tool" {
|
||||
if name, ok := tc["name"].(string); ok {
|
||||
// Force specific tool
|
||||
cfg.SetFunctionCallString(name)
|
||||
cfg.SetFunctionCallNameString(name)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -14,7 +14,13 @@ import (
|
||||
"github.com/mudler/LocalAI/pkg/utils"
|
||||
)
|
||||
|
||||
var audioDataURIPattern = regexp.MustCompile(`^data:([^;]+);base64,`)
|
||||
// Match `data:<mime>[;param=value...];base64,` — MediaRecorder in the browser
|
||||
// produces data URIs like `data:audio/webm;codecs=opus;base64,...`, so the
|
||||
// pre-`;base64,` section can contain zero or more parameter segments. The
|
||||
// old `([^;]+)` form only matched exactly one segment and left recordings
|
||||
// from the React UI's live-capture tab unparsed, which then failed base64
|
||||
// decoding on the leading `data:` bytes.
|
||||
var audioDataURIPattern = regexp.MustCompile(`^data:[^,]+?;base64,`)
|
||||
|
||||
var audioDownloadClient = http.Client{Timeout: 30 * time.Second}
|
||||
|
||||
|
||||
@@ -98,7 +98,7 @@ func (mgs *BackendEndpointService) GetAllStatusEndpoint() echo.HandlerFunc {
|
||||
// @Param request body GalleryBackend true "query params"
|
||||
// @Success 200 {object} schema.BackendResponse "Response"
|
||||
// @Router /backends/apply [post]
|
||||
func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
func (mgs *BackendEndpointService) ApplyBackendEndpoint(systemState *system.SystemState) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input := new(GalleryBackend)
|
||||
// Get input data from the request body
|
||||
@@ -106,6 +106,18 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
return err
|
||||
}
|
||||
|
||||
// In distributed mode, refuse to fan out a hardware-specific build to
|
||||
// every node — a CPU build landing on a GPU cluster is almost always
|
||||
// wrong, and the silent footgun is exactly what this guard exists for.
|
||||
// Auto-resolving (meta) backends are fine because each node picks its
|
||||
// own variant. Tooling can recover by hitting
|
||||
// POST /api/nodes/{id}/backends/install per target node.
|
||||
if mgs.backendApplier.BackendManager().IsDistributed() && input.ID != "" {
|
||||
if guard := concreteFanOutGuard(c, mgs.galleries, systemState, input.ID); guard != nil {
|
||||
return guard
|
||||
}
|
||||
}
|
||||
|
||||
uuid, err := uuid.NewUUID()
|
||||
if err != nil {
|
||||
return err
|
||||
@@ -120,6 +132,66 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
}
|
||||
}
|
||||
|
||||
// concreteFanOutGuard returns a 409 response if the requested backend is a
|
||||
// hardware-specific build (not auto-resolving / meta) and we are in
|
||||
// distributed mode. It looks up the backend in the configured galleries; if
|
||||
// the lookup itself fails (gallery unreachable, name not found), the guard
|
||||
// stays out of the way and lets the install enqueue normally — a missing
|
||||
// name will surface from the worker as a clearer error than the guard could
|
||||
// produce here. The response body deliberately speaks human, with `code` and
|
||||
// `meta_alternative` as the programmatic contract for tooling.
|
||||
func concreteFanOutGuard(c echo.Context, galleries []config.Gallery, systemState *system.SystemState, backendID string) error {
|
||||
// Use the unfiltered listing because in distributed mode the frontend's
|
||||
// hardware is irrelevant — the install targets workers, not us — and the
|
||||
// filtered list would hide variants that don't match the frontend host
|
||||
// (e.g. a CUDA build on a CPU-only frontend), preventing the guard from
|
||||
// firing for exactly the cases it's meant to protect against.
|
||||
available, err := gallery.AvailableBackendsUnfiltered(galleries, systemState)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
requested := available.FindByName(backendID)
|
||||
if requested == nil || requested.IsMeta() {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Try to find an auto-resolving (meta) backend that has this concrete
|
||||
// variant in its CapabilitiesMap, so we can suggest it as a one-shot
|
||||
// alternative. Optional — empty string is fine if no parent exists.
|
||||
metaAlternative := ""
|
||||
for _, b := range available {
|
||||
if !b.IsMeta() {
|
||||
continue
|
||||
}
|
||||
for _, concrete := range b.CapabilitiesMap {
|
||||
if concrete == backendID {
|
||||
metaAlternative = b.Name
|
||||
break
|
||||
}
|
||||
}
|
||||
if metaAlternative != "" {
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
msg := fmt.Sprintf(
|
||||
"Backend %q is a hardware-specific build and won't run correctly on every node in this cluster. In distributed mode, install it on specific nodes:\n\n POST /api/nodes/{node_id}/backends/install\n {\"backend\": %q}",
|
||||
backendID, backendID,
|
||||
)
|
||||
if metaAlternative != "" {
|
||||
msg += fmt.Sprintf(
|
||||
"\n\nTo install across all nodes, use the auto-resolving backend %q — each node picks its own variant based on its hardware.",
|
||||
metaAlternative,
|
||||
)
|
||||
}
|
||||
|
||||
return c.JSON(409, map[string]any{
|
||||
"error": msg,
|
||||
"code": "concrete_backend_requires_target",
|
||||
"meta_alternative": metaAlternative,
|
||||
})
|
||||
}
|
||||
|
||||
// DeleteBackendEndpoint lets delete backends from a LocalAI instance
|
||||
// @Summary delete backends from LocalAI.
|
||||
// @Tags backends
|
||||
|
||||
@@ -73,6 +73,10 @@ type RegisterNodeRequest struct {
|
||||
AvailableRAM uint64 `json:"available_ram,omitempty"`
|
||||
GPUVendor string `json:"gpu_vendor,omitempty"`
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
// MaxReplicasPerModel is the per-node cap on replicas of any single model.
|
||||
// Workers older than this field omit it; we coerce 0 → 1 below to preserve
|
||||
// historical single-replica behavior.
|
||||
MaxReplicasPerModel int `json:"max_replicas_per_model,omitempty"`
|
||||
}
|
||||
|
||||
// RegisterNodeEndpoint registers a new backend node.
|
||||
@@ -131,17 +135,26 @@ func RegisterNodeEndpoint(registry *nodes.NodeRegistry, expectedToken string, au
|
||||
tokenHash = hex.EncodeToString(h[:])
|
||||
}
|
||||
|
||||
// Coerce 0 → 1 for backward compat with workers that don't send the field.
|
||||
// GORM's `default:1` only fires for a missing column; once Go zero-values
|
||||
// reach the struct field they're written as 0 unless explicitly set here.
|
||||
maxReplicasPerModel := req.MaxReplicasPerModel
|
||||
if maxReplicasPerModel < 1 {
|
||||
maxReplicasPerModel = 1
|
||||
}
|
||||
|
||||
node := &nodes.BackendNode{
|
||||
Name: req.Name,
|
||||
NodeType: nodeType,
|
||||
Address: req.Address,
|
||||
HTTPAddress: req.HTTPAddress,
|
||||
TokenHash: tokenHash,
|
||||
TotalVRAM: req.TotalVRAM,
|
||||
AvailableVRAM: req.AvailableVRAM,
|
||||
TotalRAM: req.TotalRAM,
|
||||
AvailableRAM: req.AvailableRAM,
|
||||
GPUVendor: req.GPUVendor,
|
||||
Name: req.Name,
|
||||
NodeType: nodeType,
|
||||
Address: req.Address,
|
||||
HTTPAddress: req.HTTPAddress,
|
||||
TokenHash: tokenHash,
|
||||
TotalVRAM: req.TotalVRAM,
|
||||
AvailableVRAM: req.AvailableVRAM,
|
||||
TotalRAM: req.TotalRAM,
|
||||
AvailableRAM: req.AvailableRAM,
|
||||
GPUVendor: req.GPUVendor,
|
||||
MaxReplicasPerModel: maxReplicasPerModel,
|
||||
}
|
||||
|
||||
ctx := c.Request().Context()
|
||||
@@ -363,6 +376,9 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
}
|
||||
|
||||
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
|
||||
// Backend can be either a gallery ID (resolved against BackendGalleries) or a
|
||||
// direct URI install (URI + Name + optional Alias) — same shape as the
|
||||
// standalone /api/backends/install-external path, just scoped to one node.
|
||||
func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
if unloader == nil {
|
||||
@@ -372,17 +388,27 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
|
||||
var req struct {
|
||||
Backend string `json:"backend"`
|
||||
BackendGalleries string `json:"backend_galleries,omitempty"`
|
||||
URI string `json:"uri,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Alias string `json:"alias,omitempty"`
|
||||
}
|
||||
if err := c.Bind(&req); err != nil || req.Backend == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
|
||||
if err := c.Bind(&req); err != nil {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
|
||||
}
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
|
||||
// Either a gallery backend name or a direct URI must be supplied.
|
||||
if req.Backend == "" && req.URI == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
|
||||
}
|
||||
// Admin-driven backend install: not tied to a specific replica slot
|
||||
// (no model is being loaded). Pass replica 0 to match the worker's
|
||||
// admin process-key convention (`backend#0`).
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
|
||||
if err != nil {
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
|
||||
}
|
||||
if !reply.Success {
|
||||
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "error", reply.Error)
|
||||
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
|
||||
@@ -457,8 +483,8 @@ func UnloadModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
|
||||
xlog.Error("Failed to stop backend after model unload", "node", nodeID, "model", req.ModelName, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "model unloaded but backend stop failed"))
|
||||
}
|
||||
// Remove from registry
|
||||
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
|
||||
// Remove every replica of this model on the node from the registry.
|
||||
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "model unloaded"})
|
||||
}
|
||||
}
|
||||
@@ -484,7 +510,7 @@ func DeleteModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
|
||||
// Non-fatal — backend process may not be running
|
||||
xlog.Warn("StopBackend failed during model deletion (non-fatal)", "node", nodeID, "model", req.ModelName, "error", err)
|
||||
}
|
||||
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
|
||||
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "model deleted from node"})
|
||||
}
|
||||
}
|
||||
@@ -659,6 +685,78 @@ func GetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
}
|
||||
}
|
||||
|
||||
// UpdateMaxReplicasPerModelRequest is the body for the per-node replica cap endpoint.
|
||||
type UpdateMaxReplicasPerModelRequest struct {
|
||||
// Value is the new per-model replica cap on this node. Must be >= 1.
|
||||
Value int `json:"value"`
|
||||
}
|
||||
|
||||
// UpdateMaxReplicasPerModelEndpoint sets the per-node cap on how many replicas
|
||||
// of any one model can be loaded concurrently. The corresponding
|
||||
// `node.replica-slots` auto-label is refreshed so existing AND-selectors keep
|
||||
// matching, and any unsatisfiable scheduling cooldowns are cleared so the
|
||||
// reconciler retries on the next tick.
|
||||
//
|
||||
// This is a transient admin override — a worker re-registration restores the
|
||||
// value the worker was started with (--max-replicas-per-model). For permanent
|
||||
// fleet changes, change the worker flag.
|
||||
//
|
||||
// @Summary Update a node's max replicas per model
|
||||
// @Tags Nodes
|
||||
// @Param id path string true "Node ID"
|
||||
// @Param request body UpdateMaxReplicasPerModelRequest true "New value"
|
||||
// @Success 200 {object} map[string]int
|
||||
// @Failure 400 {object} map[string]any "value must be >= 1"
|
||||
// @Failure 404 {object} map[string]any "node not found"
|
||||
// @Router /api/nodes/{id}/max-replicas-per-model [put]
|
||||
func UpdateMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
ctx := c.Request().Context()
|
||||
nodeID := c.Param("id")
|
||||
if _, err := registry.Get(ctx, nodeID); err != nil {
|
||||
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
|
||||
}
|
||||
var req UpdateMaxReplicasPerModelRequest
|
||||
if err := c.Bind(&req); err != nil {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
|
||||
}
|
||||
if req.Value < 1 {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "value must be >= 1"))
|
||||
}
|
||||
if err := registry.UpdateMaxReplicasPerModel(ctx, nodeID, req.Value); err != nil {
|
||||
xlog.Error("Failed to update max_replicas_per_model", "node", nodeID, "value", req.Value, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to update max replicas per model"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]int{"max_replicas_per_model": req.Value})
|
||||
}
|
||||
}
|
||||
|
||||
// ResetMaxReplicasPerModelEndpoint clears the admin override on a node, so
|
||||
// the next worker re-registration is allowed to update the value from its
|
||||
// CLI flag again. The current value is left in place until the worker calls
|
||||
// register.
|
||||
//
|
||||
// @Summary Reset a node's max replicas per model to the worker default
|
||||
// @Tags Nodes
|
||||
// @Param id path string true "Node ID"
|
||||
// @Success 200 {object} map[string]bool
|
||||
// @Failure 404 {object} map[string]any "node not found"
|
||||
// @Router /api/nodes/{id}/max-replicas-per-model [delete]
|
||||
func ResetMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
ctx := c.Request().Context()
|
||||
nodeID := c.Param("id")
|
||||
if _, err := registry.Get(ctx, nodeID); err != nil {
|
||||
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
|
||||
}
|
||||
if err := registry.ResetMaxReplicasPerModel(ctx, nodeID); err != nil {
|
||||
xlog.Error("Failed to reset max_replicas_per_model override", "node", nodeID, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to reset override"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]bool{"reset": true})
|
||||
}
|
||||
}
|
||||
|
||||
// SetNodeLabelsEndpoint replaces all labels for a node.
|
||||
func SetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
|
||||
@@ -1315,13 +1315,35 @@ func triggerResponse(ctx context.Context, session *Session, conv *Conversation,
|
||||
}
|
||||
thinkingStartToken := reasoning.DetectThinkingStartToken(template, &config.ReasoningConfig)
|
||||
|
||||
reasoningText, responseWithoutReasoning := reasoning.ExtractReasoningWithConfig(rawResponse, thinkingStartToken, config.ReasoningConfig)
|
||||
// When the C++ autoparser emitted ChatDeltas with actionable data,
|
||||
// prefer them — the backend clears Reply.Message in that path and
|
||||
// delivers parsed content/reasoning/tool-calls via the delta stream
|
||||
// (see pkg/functions/chat_deltas.go, mirrored from chat.go's non-SSE
|
||||
// handling). Without this, Response is empty and realtime would
|
||||
// synthesize silence for replies that actually produced tokens.
|
||||
var reasoningText, responseWithoutReasoning, textContent, cleanedResponse string
|
||||
var toolCalls []functions.FuncCallResults
|
||||
deltaToolCalls := functions.ToolCallsFromChatDeltas(pred.ChatDeltas)
|
||||
deltaContent := functions.ContentFromChatDeltas(pred.ChatDeltas)
|
||||
deltaReasoning := functions.ReasoningFromChatDeltas(pred.ChatDeltas)
|
||||
if len(deltaToolCalls) > 0 || deltaContent != "" {
|
||||
xlog.Debug("[ChatDeltas] realtime: using C++ autoparser deltas",
|
||||
"tool_calls", len(deltaToolCalls),
|
||||
"content_len", len(deltaContent),
|
||||
"reasoning_len", len(deltaReasoning))
|
||||
reasoningText = deltaReasoning
|
||||
responseWithoutReasoning = deltaContent
|
||||
textContent = deltaContent
|
||||
cleanedResponse = deltaContent
|
||||
toolCalls = deltaToolCalls
|
||||
} else {
|
||||
reasoningText, responseWithoutReasoning = reasoning.ExtractReasoningWithConfig(rawResponse, thinkingStartToken, config.ReasoningConfig)
|
||||
textContent = functions.ParseTextContent(responseWithoutReasoning, config.FunctionsConfig)
|
||||
cleanedResponse = functions.CleanupLLMResult(responseWithoutReasoning, config.FunctionsConfig)
|
||||
toolCalls = functions.ParseFunctionCall(cleanedResponse, config.FunctionsConfig)
|
||||
}
|
||||
xlog.Debug("LLM Response", "reasoning", reasoningText, "response_without_reasoning", responseWithoutReasoning)
|
||||
|
||||
textContent := functions.ParseTextContent(responseWithoutReasoning, config.FunctionsConfig)
|
||||
cleanedResponse := functions.CleanupLLMResult(responseWithoutReasoning, config.FunctionsConfig)
|
||||
toolCalls := functions.ParseFunctionCall(cleanedResponse, config.FunctionsConfig)
|
||||
|
||||
xlog.Debug("Function call parsing", "textContent", textContent, "cleanedResponse", cleanedResponse, "toolCallsCount", len(toolCalls))
|
||||
|
||||
noActionName := "answer"
|
||||
|
||||
@@ -168,7 +168,7 @@ func (m *wrappedModel) Predict(ctx context.Context, messages schema.Messages, im
|
||||
}
|
||||
} else if toolChoice.Function != nil {
|
||||
// Specific function specified
|
||||
m.LLMConfig.SetFunctionCallString(toolChoice.Function.Name)
|
||||
m.LLMConfig.SetFunctionCallNameString(toolChoice.Function.Name)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -773,7 +773,7 @@ func convertORToolsToFunctions(input *schema.OpenResponsesRequest, cfg *config.M
|
||||
case map[string]any:
|
||||
if tcType, ok := tc["type"].(string); ok && tcType == "function" {
|
||||
if name, ok := tc["name"].(string); ok {
|
||||
cfg.SetFunctionCallString(name)
|
||||
cfg.SetFunctionCallNameString(name)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,29 +1,32 @@
|
||||
import { test, expect } from '@playwright/test'
|
||||
|
||||
test.describe('Manage Page - Backend Logs Link', () => {
|
||||
test('models table shows terminal icon for logs', async ({ page }) => {
|
||||
test('row action menu exposes Backend logs entry with terminal icon', async ({ page }) => {
|
||||
await page.goto('/app/manage')
|
||||
// Wait for models to load
|
||||
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
// Check for terminal icon (backend logs link)
|
||||
const terminalIcon = page.locator('a[title="Backend logs"] i.fa-terminal')
|
||||
await expect(terminalIcon.first()).toBeVisible()
|
||||
// Row actions live behind the kebab (ActionMenu) — open the first row's menu.
|
||||
const trigger = page.locator('button.action-menu__trigger').first()
|
||||
await expect(trigger).toBeVisible()
|
||||
await trigger.click()
|
||||
|
||||
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
|
||||
await expect(logsItem).toBeVisible()
|
||||
await expect(logsItem.locator('i.fa-terminal')).toBeVisible()
|
||||
})
|
||||
|
||||
test('terminal icon links to backend-logs page', async ({ page }) => {
|
||||
test('Backend logs menu item navigates to backend-logs page', async ({ page }) => {
|
||||
await page.goto('/app/manage')
|
||||
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
const logsLink = page.locator('a[title="Backend logs"]').first()
|
||||
await expect(logsLink).toBeVisible()
|
||||
const trigger = page.locator('button.action-menu__trigger').first()
|
||||
await expect(trigger).toBeVisible()
|
||||
await trigger.click()
|
||||
|
||||
// Link uses href="#" with onClick for navigation
|
||||
const href = await logsLink.getAttribute('href')
|
||||
expect(href).toBe('#')
|
||||
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
|
||||
await expect(logsItem).toBeVisible()
|
||||
await logsItem.click()
|
||||
|
||||
// Click and verify navigation
|
||||
await logsLink.click()
|
||||
await expect(page).toHaveURL(/\/app\/backend-logs\//)
|
||||
})
|
||||
})
|
||||
|
||||
166
core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
Normal file
166
core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
Normal file
@@ -0,0 +1,166 @@
|
||||
import { test, expect } from '@playwright/test'
|
||||
|
||||
// These specs cover the per-node backend row in the Nodes page:
|
||||
// - the upgrade affordance is self-explanatory (icon + tooltip)
|
||||
// - a delete affordance is present and goes through ConfirmDialog
|
||||
//
|
||||
// We mock the distributed-mode API so the tests can run against the
|
||||
// standalone ui-test-server without spinning up workers/NATS.
|
||||
|
||||
const NODE_ID = 'test-node-1'
|
||||
const NODE_NAME = 'worker-test'
|
||||
const BACKEND_NAME = 'cuda12-vllm-development'
|
||||
|
||||
async function mockDistributedNodes(page, { onDelete } = {}) {
|
||||
await page.route('**/api/nodes', (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify([
|
||||
{
|
||||
id: NODE_ID,
|
||||
name: NODE_NAME,
|
||||
node_type: 'backend',
|
||||
address: '10.0.0.1:50051',
|
||||
http_address: '10.0.0.1:8090',
|
||||
status: 'healthy',
|
||||
total_vram: 0,
|
||||
available_vram: 0,
|
||||
total_ram: 8_000_000_000,
|
||||
available_ram: 4_000_000_000,
|
||||
gpu_vendor: '',
|
||||
last_heartbeat: new Date().toISOString(),
|
||||
created_at: new Date().toISOString(),
|
||||
updated_at: new Date().toISOString(),
|
||||
},
|
||||
]),
|
||||
})
|
||||
})
|
||||
|
||||
await page.route('**/api/nodes/scheduling', (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: '[]',
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/models`, (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: '[]',
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/backends`, (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify([
|
||||
{
|
||||
name: BACKEND_NAME,
|
||||
is_system: false,
|
||||
is_meta: false,
|
||||
installed_at: new Date().toISOString(),
|
||||
},
|
||||
]),
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/backends/delete`, async (route) => {
|
||||
if (onDelete) {
|
||||
await onDelete(route)
|
||||
}
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify({ message: 'backend deleted' }),
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
async function expandNodeAndWaitForBackends(page) {
|
||||
await page.goto('/app/nodes')
|
||||
// Click the row to expand it. The chevron toggle and the row both work,
|
||||
// but clicking the name cell is the most user-like.
|
||||
await page.getByText(NODE_NAME).first().click()
|
||||
// Backends, Capacity and Labels live behind a "Manage" <details>
|
||||
// disclosure (the drawer was distilled to keep at-a-glance content
|
||||
// lean — see distill refactor in the multi-replica branch). Open it
|
||||
// by clicking the summary inside the .node-manage scope so the
|
||||
// per-node backend table is in the DOM before assertions run.
|
||||
await page.locator('.node-manage > summary').first().click()
|
||||
await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
|
||||
}
|
||||
|
||||
test.describe('Nodes page — per-node backend actions', () => {
|
||||
test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
|
||||
await mockDistributedNodes(page)
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
// Negative: the old, ambiguous wording must not be used.
|
||||
await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
|
||||
await expect(page.locator('button[title="Reinstall backend"] i.fa-sync-alt')).toHaveCount(0)
|
||||
|
||||
// Positive: a self-explanatory upgrade affordance is rendered next to the
|
||||
// backend row. We accept either an arrow-up or arrows-rotate glyph; both
|
||||
// map to "upgrade" semantics in FontAwesome 6 unambiguously.
|
||||
const upgradeBtn = page.locator('button[title="Upgrade backend on this node"]')
|
||||
await expect(upgradeBtn).toBeVisible()
|
||||
const iconClass = await upgradeBtn.locator('i').getAttribute('class')
|
||||
expect(iconClass).toMatch(/fa-(arrow-up|arrows-rotate|up-long)/)
|
||||
})
|
||||
|
||||
test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
|
||||
await mockDistributedNodes(page)
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
const deleteBtn = page.locator('button[title="Delete backend from this node"]')
|
||||
await expect(deleteBtn).toBeVisible()
|
||||
await expect(deleteBtn.locator('i.fa-trash')).toBeVisible()
|
||||
})
|
||||
|
||||
test('clicking delete opens the confirm dialog and POSTs to the per-node delete endpoint', async ({ page }) => {
|
||||
let postedBody = null
|
||||
await mockDistributedNodes(page, {
|
||||
onDelete: async (route) => {
|
||||
postedBody = route.request().postDataJSON()
|
||||
},
|
||||
})
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
await page.locator('button[title="Delete backend from this node"]').click()
|
||||
|
||||
// ConfirmDialog uses role="alertdialog" and a danger confirm button.
|
||||
const dialog = page.getByRole('alertdialog')
|
||||
await expect(dialog).toBeVisible()
|
||||
const confirmBtn = dialog.locator('button.btn-danger')
|
||||
await expect(confirmBtn).toBeVisible()
|
||||
await confirmBtn.click()
|
||||
|
||||
// Wait until the POST landed.
|
||||
await expect.poll(() => postedBody, { timeout: 5_000 }).toEqual({ backend: BACKEND_NAME })
|
||||
})
|
||||
|
||||
test('clicking delete and cancelling does not POST', async ({ page }) => {
|
||||
let deleteCalls = 0
|
||||
await mockDistributedNodes(page, {
|
||||
onDelete: () => {
|
||||
deleteCalls += 1
|
||||
},
|
||||
})
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
await page.locator('button[title="Delete backend from this node"]').click()
|
||||
|
||||
const dialog = page.getByRole('alertdialog')
|
||||
await expect(dialog).toBeVisible()
|
||||
await dialog.getByRole('button', { name: /cancel/i }).click()
|
||||
await expect(dialog).toBeHidden()
|
||||
|
||||
// Give any errant request a moment to fire so a regression would be caught.
|
||||
await page.waitForTimeout(500)
|
||||
expect(deleteCalls).toBe(0)
|
||||
})
|
||||
})
|
||||
@@ -7,7 +7,7 @@
|
||||
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com" />
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
|
||||
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&display=swap" rel="stylesheet" />
|
||||
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@300..700&family=Geist+Mono:wght@300..700&display=swap" rel="stylesheet" />
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
|
||||
7
core/http/react-ui/package-lock.json
generated
7
core/http/react-ui/package-lock.json
generated
@@ -3258,9 +3258,9 @@
|
||||
}
|
||||
},
|
||||
"node_modules/postcss": {
|
||||
"version": "8.5.8",
|
||||
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz",
|
||||
"integrity": "sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg==",
|
||||
"version": "8.5.10",
|
||||
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.10.tgz",
|
||||
"integrity": "sha512-pMMHxBOZKFU6HgAZ4eyGnwXF/EvPGGqUr0MnZ5+99485wwW41kW91A4LOGxSHhgugZmSChL5AlElNdwlNgcnLQ==",
|
||||
"dev": true,
|
||||
"funding": [
|
||||
{
|
||||
@@ -3276,6 +3276,7 @@
|
||||
"url": "https://github.com/sponsors/ai"
|
||||
}
|
||||
],
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"nanoid": "^3.3.11",
|
||||
"picocolors": "^1.1.1",
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,9 +1,11 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
import { Outlet, useLocation } from 'react-router-dom'
|
||||
import { useState, useEffect, useRef } from 'react'
|
||||
import { Outlet, useLocation, useNavigate } from 'react-router-dom'
|
||||
import Sidebar from './components/Sidebar'
|
||||
import OperationsBar from './components/OperationsBar'
|
||||
import { ToastContainer, useToast } from './components/Toast'
|
||||
import { systemApi } from './utils/api'
|
||||
import { useTheme } from './contexts/ThemeContext'
|
||||
import { useAuth } from './context/AuthContext'
|
||||
|
||||
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
|
||||
|
||||
@@ -15,6 +17,10 @@ export default function App() {
|
||||
const { toasts, addToast, removeToast } = useToast()
|
||||
const [version, setVersion] = useState('')
|
||||
const location = useLocation()
|
||||
const navigate = useNavigate()
|
||||
const { theme, toggleTheme } = useTheme()
|
||||
const { authEnabled, user } = useAuth()
|
||||
const hamburgerRef = useRef(null)
|
||||
const isChatRoute = location.pathname.match(/\/chat(\/|$)/) || location.pathname.match(/\/agents\/[^/]+\/chat/)
|
||||
|
||||
useEffect(() => {
|
||||
@@ -34,26 +40,80 @@ export default function App() {
|
||||
window.scrollTo(0, 0)
|
||||
}, [location.pathname])
|
||||
|
||||
// Drawer polish: lock body scroll, close on Escape, return focus to the
|
||||
// hamburger when the drawer closes. Only engages when the drawer is open;
|
||||
// desktop and tablet rail mode are unaffected.
|
||||
useEffect(() => {
|
||||
if (!sidebarOpen) return
|
||||
const prevOverflow = document.body.style.overflow
|
||||
document.body.style.overflow = 'hidden'
|
||||
const onKey = (e) => { if (e.key === 'Escape') setSidebarOpen(false) }
|
||||
window.addEventListener('keydown', onKey)
|
||||
return () => {
|
||||
document.body.style.overflow = prevOverflow
|
||||
window.removeEventListener('keydown', onKey)
|
||||
// Restore focus to the trigger so keyboard users land back where
|
||||
// they invoked the drawer from.
|
||||
hamburgerRef.current?.focus()
|
||||
}
|
||||
}, [sidebarOpen])
|
||||
|
||||
const layoutClasses = [
|
||||
'app-layout',
|
||||
isChatRoute ? 'app-layout-chat' : '',
|
||||
sidebarCollapsed ? 'sidebar-is-collapsed' : '',
|
||||
].filter(Boolean).join(' ')
|
||||
|
||||
const showAvatar = authEnabled && user
|
||||
const accountLabel = user?.name || user?.email || 'Account'
|
||||
|
||||
return (
|
||||
<div className={layoutClasses}>
|
||||
<Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
|
||||
<main className="main-content">
|
||||
<main className="main-content" {...(sidebarOpen ? { 'aria-hidden': 'true', inert: '' } : {})}>
|
||||
<OperationsBar />
|
||||
{/* Mobile header */}
|
||||
{/* Mobile header — primary actions reachable without opening the
|
||||
drawer. Hamburger is the only way to expand the nav on phones;
|
||||
theme toggle and account avatar are mirrored from the sidebar
|
||||
footer so they remain one tap away. */}
|
||||
<header className="mobile-header">
|
||||
<button
|
||||
ref={hamburgerRef}
|
||||
className="hamburger-btn"
|
||||
onClick={() => setSidebarOpen(true)}
|
||||
aria-label="Open menu"
|
||||
aria-expanded={sidebarOpen}
|
||||
aria-controls="app-sidebar"
|
||||
>
|
||||
<i className="fas fa-bars" />
|
||||
<i className="fas fa-bars" aria-hidden="true" />
|
||||
</button>
|
||||
<span className="mobile-title">LocalAI</span>
|
||||
<div className="mobile-header-actions">
|
||||
<button
|
||||
type="button"
|
||||
className="mobile-header-btn"
|
||||
onClick={toggleTheme}
|
||||
aria-label={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
|
||||
title={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
|
||||
>
|
||||
<i className={`fas ${theme === 'dark' ? 'fa-sun' : 'fa-moon'}`} aria-hidden="true" />
|
||||
</button>
|
||||
{showAvatar && (
|
||||
<button
|
||||
type="button"
|
||||
className="mobile-header-btn mobile-header-avatar"
|
||||
onClick={() => navigate('/app/account')}
|
||||
aria-label={`Account: ${accountLabel}`}
|
||||
title={accountLabel}
|
||||
>
|
||||
{user.avatarUrl ? (
|
||||
<img src={user.avatarUrl} alt="" />
|
||||
) : (
|
||||
<i className="fas fa-user-circle" aria-hidden="true" />
|
||||
)}
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
</header>
|
||||
<div className="main-content-inner">
|
||||
<div className="page-transition" key={location.pathname}>
|
||||
|
||||
141
core/http/react-ui/src/components/ActionMenu.jsx
Normal file
141
core/http/react-ui/src/components/ActionMenu.jsx
Normal file
@@ -0,0 +1,141 @@
|
||||
import { useRef, useState, useEffect, useCallback } from 'react'
|
||||
import Popover from './Popover'
|
||||
|
||||
// ActionMenu renders a kebab (three-dot) button that opens a popover with a
|
||||
// list of row actions. Replaces the inline cluster of icon buttons that made
|
||||
// dense tables feel like a control panel — actions stay out of the way until
|
||||
// the user reaches for them, the way Linear/Vercel/Notion handle row menus.
|
||||
//
|
||||
// Items shape:
|
||||
// { key, icon?, label, onClick, danger?, disabled?, hidden?, shortcut? }
|
||||
// { divider: true } // visual separator
|
||||
// { type: 'badge', icon?, label } // non-interactive badge row
|
||||
//
|
||||
// Hidden items are filtered out so callers can write conditional menus
|
||||
// inline (`{ key: 'stop', visible: isRunning, ... }` style) without ternaries.
|
||||
//
|
||||
// Keyboard:
|
||||
// ArrowUp / ArrowDown — move highlight (skipping dividers + badges)
|
||||
// Enter / Space — activate
|
||||
// Escape — close, return focus to trigger
|
||||
export default function ActionMenu({ items, ariaLabel = 'Actions', triggerLabel, compact = false }) {
|
||||
const triggerRef = useRef(null)
|
||||
const [open, setOpen] = useState(false)
|
||||
const [activeIdx, setActiveIdx] = useState(-1)
|
||||
|
||||
const interactive = (Array.isArray(items) ? items : []).filter(it => it && !it.divider && it.type !== 'badge' && !it.hidden)
|
||||
const visible = (Array.isArray(items) ? items : []).filter(it => it && !it.hidden)
|
||||
|
||||
const close = useCallback(() => {
|
||||
setOpen(false)
|
||||
setActiveIdx(-1)
|
||||
}, [])
|
||||
|
||||
// Move highlight to the first interactive item when opening, so keyboard
|
||||
// users land somewhere meaningful instead of having to arrow into the menu.
|
||||
useEffect(() => {
|
||||
if (open && activeIdx === -1 && interactive.length > 0) {
|
||||
setActiveIdx(0)
|
||||
}
|
||||
}, [open, activeIdx, interactive.length])
|
||||
|
||||
const handleTriggerKeyDown = (e) => {
|
||||
if (e.key === 'ArrowDown' || e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
e.stopPropagation()
|
||||
setOpen(true)
|
||||
}
|
||||
}
|
||||
|
||||
const handleMenuKeyDown = (e) => {
|
||||
if (e.key === 'ArrowDown') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(i => Math.min(interactive.length - 1, (i < 0 ? -1 : i) + 1))
|
||||
} else if (e.key === 'ArrowUp') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(i => Math.max(0, (i < 0 ? interactive.length : i) - 1))
|
||||
} else if (e.key === 'Home') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(0)
|
||||
} else if (e.key === 'End') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(interactive.length - 1)
|
||||
} else if (e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
const item = interactive[activeIdx]
|
||||
if (item && !item.disabled) {
|
||||
close()
|
||||
item.onClick?.()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (interactive.length === 0 && !visible.some(it => it.type === 'badge')) {
|
||||
return null
|
||||
}
|
||||
|
||||
return (
|
||||
<>
|
||||
<button
|
||||
ref={triggerRef}
|
||||
type="button"
|
||||
className={`action-menu__trigger${compact ? ' action-menu__trigger--compact' : ''}${open ? ' is-open' : ''}`}
|
||||
aria-haspopup="menu"
|
||||
aria-expanded={open}
|
||||
aria-label={triggerLabel || ariaLabel}
|
||||
onClick={(e) => { e.stopPropagation(); setOpen(v => !v) }}
|
||||
onKeyDown={handleTriggerKeyDown}
|
||||
>
|
||||
<i className="fas fa-ellipsis-vertical" />
|
||||
</button>
|
||||
<Popover anchor={triggerRef} open={open} onClose={close} ariaLabel={ariaLabel}>
|
||||
<div
|
||||
role="menu"
|
||||
aria-label={ariaLabel}
|
||||
className="action-menu"
|
||||
onKeyDown={handleMenuKeyDown}
|
||||
// Capture focus when the menu opens so arrow keys work without the
|
||||
// user clicking inside first.
|
||||
tabIndex={-1}
|
||||
ref={el => { if (el && open) el.focus() }}
|
||||
>
|
||||
{visible.map((item, i) => {
|
||||
if (item.divider) {
|
||||
return <div key={`d-${i}`} className="action-menu__divider" role="separator" />
|
||||
}
|
||||
if (item.type === 'badge') {
|
||||
return (
|
||||
<div key={item.key || `b-${i}`} className="action-menu__badge" role="presentation">
|
||||
{item.icon && <i className={`fas ${item.icon}`} aria-hidden="true" />}
|
||||
<span>{item.label}</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
const idx = interactive.indexOf(item)
|
||||
const active = idx === activeIdx
|
||||
return (
|
||||
<button
|
||||
key={item.key}
|
||||
type="button"
|
||||
role="menuitem"
|
||||
disabled={item.disabled}
|
||||
className={`action-menu__item${item.danger ? ' is-danger' : ''}${active ? ' is-active' : ''}`}
|
||||
onMouseEnter={() => setActiveIdx(idx)}
|
||||
onClick={(e) => {
|
||||
e.stopPropagation()
|
||||
if (item.disabled) return
|
||||
close()
|
||||
item.onClick?.()
|
||||
}}
|
||||
>
|
||||
{item.icon && <i className={`fas ${item.icon} action-menu__icon`} aria-hidden="true" />}
|
||||
<span className="action-menu__label">{item.label}</span>
|
||||
{item.shortcut && <span className="action-menu__shortcut">{item.shortcut}</span>}
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</Popover>
|
||||
</>
|
||||
)
|
||||
}
|
||||
@@ -80,7 +80,7 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
|
||||
value={url}
|
||||
onChange={e => setUrl(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
@@ -88,7 +88,7 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Name (optional)"
|
||||
value={name}
|
||||
onChange={e => setName(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="password"
|
||||
@@ -96,13 +96,13 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Auth token (optional)"
|
||||
value={authToken}
|
||||
onChange={e => setAuthToken(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
|
||||
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
|
||||
Use CORS proxy
|
||||
</label>
|
||||
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
|
||||
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
|
||||
<button type="button" className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
|
||||
</div>
|
||||
|
||||
@@ -135,7 +135,7 @@ function JsonEditor({ value, onChange }) {
|
||||
className="input"
|
||||
value={text}
|
||||
onChange={e => handleChange(e.target.value)}
|
||||
style={{ width: '100%', minHeight: 80, fontFamily: 'monospace', fontSize: '0.8125rem', resize: 'vertical' }}
|
||||
style={{ width: '100%', minHeight: 80, fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', resize: 'vertical' }}
|
||||
/>
|
||||
{parseError && <div style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 2 }}>{parseError}</div>}
|
||||
</div>
|
||||
|
||||
@@ -158,7 +158,7 @@ export default function FieldBrowser({ fields, activeFieldPaths, onAddField }) {
|
||||
{field.description}
|
||||
</div>
|
||||
)}
|
||||
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'monospace' }}>
|
||||
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'var(--font-mono)' }}>
|
||||
{field.path}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
79
core/http/react-ui/src/components/GalleryLoader.jsx
Normal file
79
core/http/react-ui/src/components/GalleryLoader.jsx
Normal file
@@ -0,0 +1,79 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
|
||||
const LOADING_PHRASES = [
|
||||
{ text: 'Loading models...', icon: 'fa-brain' },
|
||||
{ text: 'Fetching gallery...', icon: 'fa-download' },
|
||||
{ text: 'Checking availability...', icon: 'fa-circle-check' },
|
||||
{ text: 'Almost ready...', icon: 'fa-hourglass-half' },
|
||||
{ text: 'Preparing gallery...', icon: 'fa-store' },
|
||||
]
|
||||
|
||||
// GalleryLoader is the animated skeleton used while the gallery list loads.
|
||||
// Used by Models, Backends, and (now) the Manage page so an empty fetch state
|
||||
// reads the same everywhere instead of one tab showing pulsing dots and the
|
||||
// other showing "Loading...".
|
||||
export default function GalleryLoader() {
|
||||
const [idx, setIdx] = useState(() => Math.floor(Math.random() * LOADING_PHRASES.length))
|
||||
const [fade, setFade] = useState(true)
|
||||
|
||||
useEffect(() => {
|
||||
const interval = setInterval(() => {
|
||||
setFade(false)
|
||||
setTimeout(() => {
|
||||
setIdx(prev => (prev + 1) % LOADING_PHRASES.length)
|
||||
setFade(true)
|
||||
}, 300)
|
||||
}, 2800)
|
||||
return () => clearInterval(interval)
|
||||
}, [])
|
||||
|
||||
const phrase = LOADING_PHRASES[idx]
|
||||
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex', flexDirection: 'column', alignItems: 'center',
|
||||
justifyContent: 'center', padding: 'var(--spacing-xl) var(--spacing-md)',
|
||||
minHeight: '280px', gap: 'var(--spacing-lg)',
|
||||
}}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)' }}>
|
||||
{[0, 1, 2, 3, 4].map(i => (
|
||||
<div key={i} style={{
|
||||
width: 10, height: 10, borderRadius: '50%',
|
||||
background: 'var(--color-primary)',
|
||||
animation: `galleryDot 1.4s ease-in-out ${i * 0.15}s infinite`,
|
||||
}} />
|
||||
))}
|
||||
</div>
|
||||
<div style={{
|
||||
display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)',
|
||||
opacity: fade ? 1 : 0,
|
||||
transition: 'opacity 300ms ease',
|
||||
color: 'var(--color-text-secondary)',
|
||||
fontSize: '0.9375rem',
|
||||
fontWeight: 500,
|
||||
}}>
|
||||
<i className={`fas ${phrase.icon}`} style={{ color: 'var(--color-accent)', fontSize: '1.125rem' }} />
|
||||
{phrase.text}
|
||||
</div>
|
||||
<div style={{ width: '100%', maxWidth: '700px', display: 'flex', flexDirection: 'column', gap: '12px' }}>
|
||||
{[0.9, 0.7, 0.5].map((opacity, i) => (
|
||||
<div key={i} style={{
|
||||
height: '48px', borderRadius: 'var(--radius-md)',
|
||||
background: 'var(--color-bg-tertiary)', opacity,
|
||||
animation: `galleryShimmer 1.8s ease-in-out ${i * 0.2}s infinite`,
|
||||
}} />
|
||||
))}
|
||||
</div>
|
||||
<style>{`
|
||||
@keyframes galleryDot {
|
||||
0%, 80%, 100% { transform: scale(0.4); opacity: 0.3; }
|
||||
40% { transform: scale(1); opacity: 1; }
|
||||
}
|
||||
@keyframes galleryShimmer {
|
||||
0%, 100% { opacity: var(--shimmer-base, 0.15); }
|
||||
50% { opacity: var(--shimmer-peak, 0.3); }
|
||||
}
|
||||
`}</style>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
47
core/http/react-ui/src/components/ManageSummary.jsx
Normal file
47
core/http/react-ui/src/components/ManageSummary.jsx
Normal file
@@ -0,0 +1,47 @@
|
||||
import StatCard from './StatCard'
|
||||
|
||||
// ManageSummary anchors the Manage page with the same StatCard pattern the
|
||||
// Nodes dashboard uses, so the page reads as a real overview rather than
|
||||
// "two tabs in a hat". Counts are derived in-memory by the parent — this
|
||||
// component is purely presentational. Cards are clickable and route the
|
||||
// user to the relevant tab + filter.
|
||||
export default function ManageSummary({
|
||||
modelsCount,
|
||||
backendsCount,
|
||||
runningCount,
|
||||
updatesCount,
|
||||
onCardClick,
|
||||
}) {
|
||||
const click = (tab, filter) => onCardClick && onCardClick(tab, filter)
|
||||
|
||||
return (
|
||||
<div className="stat-grid manage-summary">
|
||||
<StatCard
|
||||
icon="fas fa-brain"
|
||||
label="Models Installed"
|
||||
value={modelsCount}
|
||||
onClick={() => click('models', 'all')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-server"
|
||||
label="Backends Installed"
|
||||
value={backendsCount}
|
||||
onClick={() => click('backends', 'all')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-circle-play"
|
||||
label="Currently Running"
|
||||
value={runningCount}
|
||||
accentVar={runningCount > 0 ? '--color-success' : undefined}
|
||||
onClick={() => click('models', 'running')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-arrow-up"
|
||||
label="Updates Available"
|
||||
value={updatesCount}
|
||||
accentVar={updatesCount > 0 ? '--color-warning' : undefined}
|
||||
onClick={() => click('backends', updatesCount > 0 ? 'upgradable' : 'all')}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
30
core/http/react-ui/src/components/MetaBadgeRow.jsx
Normal file
30
core/http/react-ui/src/components/MetaBadgeRow.jsx
Normal file
@@ -0,0 +1,30 @@
|
||||
// MetaBadgeRow renders the System / User / Meta / Dev badge cluster the same
|
||||
// way everywhere — Manage tabs and (in future) Install gallery. The badges
|
||||
// already exist as classes; this component locks down the icons + labels so
|
||||
// the same backend type doesn't read "User" in one tab and "downloaded" in
|
||||
// another.
|
||||
export default function MetaBadgeRow({ isSystem, isMeta, isDevelopment }) {
|
||||
return (
|
||||
<div className="badge-row">
|
||||
{isSystem ? (
|
||||
<span className="badge badge-info" title="Bundled with the LocalAI runtime">
|
||||
<i className="fas fa-shield-alt" /> System
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge badge-success" title="Installed from the gallery or external source">
|
||||
<i className="fas fa-download" /> User
|
||||
</span>
|
||||
)}
|
||||
{isMeta && (
|
||||
<span className="badge badge-accent" title="Meta backend — selects a concrete variant per node">
|
||||
<i className="fas fa-layer-group" /> Meta
|
||||
</span>
|
||||
)}
|
||||
{isDevelopment && (
|
||||
<span className="badge badge-warning" title="Marked as development / pre-release by the gallery">
|
||||
<i className="fas fa-flask" /> Dev
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
668
core/http/react-ui/src/components/NodeInstallPicker.jsx
Normal file
668
core/http/react-ui/src/components/NodeInstallPicker.jsx
Normal file
@@ -0,0 +1,668 @@
|
||||
import { useState, useMemo, useEffect, useRef } from 'react'
|
||||
import Modal from './Modal'
|
||||
import SearchableSelect from './SearchableSelect'
|
||||
import { nodesApi } from '../utils/api'
|
||||
|
||||
// NodeInstallPicker is the single multi-node install surface used both from
|
||||
// the Backends gallery split-button and from the "Install on more nodes" `+`
|
||||
// affordance in the Nodes column. Submit fires N parallel per-node install
|
||||
// calls; rows transition inline so the user sees per-node success/failure
|
||||
// without leaving the modal.
|
||||
//
|
||||
// Props:
|
||||
// open — controls visibility
|
||||
// onClose — close handler (header X / Cancel / Esc / backdrop)
|
||||
// onComplete — fired after at least one node install succeeded;
|
||||
// gallery uses this to refetch and update the Nodes
|
||||
// column without a manual reload
|
||||
// backend — { name, isMeta, capabilities, metaBackendFor }
|
||||
// nodes — BackendNode[] from /api/nodes
|
||||
// installedNodeIds — Set/array of node IDs that already have this backend
|
||||
// initialSelection — optional pre-selected node IDs (e.g. "missing nodes"
|
||||
// when opened from the Nodes column `+` affordance)
|
||||
|
||||
const STATUS_LABELS = { healthy: 'Healthy', draining: 'Draining', unhealthy: 'Unhealthy', offline: 'Offline' }
|
||||
|
||||
function formatVRAM(bytes) {
|
||||
if (!bytes || bytes === 0) return null
|
||||
const gb = bytes / (1024 * 1024 * 1024)
|
||||
return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
|
||||
}
|
||||
|
||||
function gpuVendorLabel(vendor) {
|
||||
const labels = { nvidia: 'NVIDIA', amd: 'AMD', intel: 'Intel', vulkan: 'Vulkan' }
|
||||
return labels[vendor] || null
|
||||
}
|
||||
|
||||
// hardwareTargetOf parses the capability key that points to a concrete
|
||||
// variant in the parent meta's CapabilitiesMap. e.g. cpu-llama-cpp comes
|
||||
// from {"cpu": "cpu-llama-cpp"} → "cpu". Falls back to "" when the parent
|
||||
// is unknown (the gallery list payload still gives us metaBackendFor).
|
||||
function hardwareTargetOf(backend, allBackends) {
|
||||
if (!backend || !backend.name || backend.isMeta) return ''
|
||||
const parentName = backend.metaBackendFor
|
||||
if (!parentName) return ''
|
||||
const parent = (allBackends || []).find(b => b.name === parentName || b.id === parentName)
|
||||
if (!parent || !parent.capabilities) return ''
|
||||
for (const [cap, concreteName] of Object.entries(parent.capabilities)) {
|
||||
if (concreteName === backend.name) return cap
|
||||
}
|
||||
return ''
|
||||
}
|
||||
|
||||
// humanTargetLabel turns a capability key into a user-facing phrase used in
|
||||
// the picker header note: "CPU build", "CUDA 12 build", etc. Keep it
|
||||
// concrete and product-recognisable, not the raw token from the gallery.
|
||||
function humanTargetLabel(target) {
|
||||
if (!target) return 'hardware-specific build'
|
||||
const t = target.toLowerCase()
|
||||
if (t.startsWith('cpu') || t === 'default') return 'CPU build'
|
||||
if (t.includes('cuda-13') || t.includes('cuda13')) return 'CUDA 13 build'
|
||||
if (t.includes('cuda-12') || t.includes('cuda12')) return 'CUDA 12 build'
|
||||
if (t.includes('cuda')) return 'NVIDIA CUDA build'
|
||||
if (t.includes('l4t')) return 'NVIDIA Jetson (L4T) build'
|
||||
if (t.includes('nvidia')) return 'NVIDIA build'
|
||||
if (t.includes('rocm') || t.includes('amd')) return 'AMD ROCm build'
|
||||
if (t.includes('metal')) return 'Apple Metal build'
|
||||
if (t.includes('sycl') || t.includes('intel')) return 'Intel SYCL build'
|
||||
if (t.includes('vulkan')) return 'Vulkan build'
|
||||
if (t.includes('darwin-x86')) return 'macOS x86 build'
|
||||
return 'hardware-specific build'
|
||||
}
|
||||
|
||||
// suitabilityFor returns the picker's per-row suitability state for the
|
||||
// requested backend. Already-installed wins over compatible/override so
|
||||
// the user sees a single signal per row.
|
||||
function suitabilityFor({ node, backend, hardwareTarget, alreadyInstalled }) {
|
||||
if (alreadyInstalled) return 'installed'
|
||||
// backend can be null on the first render before pickerBackend is set —
|
||||
// this function is invoked from useMemo, which runs regardless of the
|
||||
// outer open guard. Treat missing data as "compatible" so the placeholder
|
||||
// render doesn't blow up; the picker won't actually paint anything until
|
||||
// the early-return below the hooks fires.
|
||||
if (!backend || backend.isMeta || !hardwareTarget) return 'compatible'
|
||||
const vendor = (node.gpu_vendor || '').toLowerCase()
|
||||
const t = hardwareTarget.toLowerCase()
|
||||
if (t.startsWith('cpu') || t === 'default') {
|
||||
// CPU builds always run; they're never marked Override (running CPU on a
|
||||
// GPU node is the headline use case the user is choosing intentionally).
|
||||
return 'compatible'
|
||||
}
|
||||
if (t.includes('nvidia') || t.includes('cuda') || t.includes('l4t')) {
|
||||
return vendor === 'nvidia' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('amd') || t.includes('rocm') || t.includes('hip')) {
|
||||
return vendor === 'amd' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('intel') || t.includes('sycl')) {
|
||||
return vendor === 'intel' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('metal') || t.includes('darwin')) {
|
||||
// No vendor reporting for Metal; trust the user.
|
||||
return 'compatible'
|
||||
}
|
||||
return 'compatible'
|
||||
}
|
||||
|
||||
export default function NodeInstallPicker({
|
||||
open, onClose, onComplete,
|
||||
backend,
|
||||
nodes = [],
|
||||
allBackends = [],
|
||||
installedNodeIds = [],
|
||||
initialSelection,
|
||||
addToast,
|
||||
}) {
|
||||
const [search, setSearch] = useState('')
|
||||
const [showHealthy, setShowHealthy] = useState(true)
|
||||
const [showDraining, setShowDraining] = useState(false)
|
||||
const [selected, setSelected] = useState(() => new Set())
|
||||
const [overrideVariant, setOverrideVariant] = useState('') // chosen concrete name
|
||||
const [overrideExpanded, setOverrideExpanded] = useState(false)
|
||||
const [submitting, setSubmitting] = useState(false)
|
||||
const [showMismatchConfirm, setShowMismatchConfirm] = useState(false)
|
||||
// Per-node submission state: { [nodeId]: { status: 'pending'|'installing'|'done'|'error', error? , version? } }
|
||||
const [perNode, setPerNode] = useState({})
|
||||
const headerInputRef = useRef(null)
|
||||
|
||||
// Backend-derived metadata used throughout the picker.
|
||||
const hardwareTarget = useMemo(() => hardwareTargetOf(backend, allBackends), [backend, allBackends])
|
||||
const targetLabel = humanTargetLabel(hardwareTarget)
|
||||
const concreteVariants = useMemo(() => {
|
||||
if (!backend?.isMeta || !backend.capabilities) return []
|
||||
return Object.entries(backend.capabilities).map(([cap, concrete]) => ({
|
||||
value: concrete,
|
||||
label: `${concrete} · ${cap}`,
|
||||
}))
|
||||
}, [backend])
|
||||
|
||||
// Pending nodes are surgically removed from the list — they can't accept
|
||||
// installs until approved. Surface the count instead of dead-disabled rows.
|
||||
const pendingCount = nodes.filter(n => n.status === 'pending').length
|
||||
const backendNodes = nodes.filter(n =>
|
||||
(!n.node_type || n.node_type === 'backend') && n.status !== 'pending'
|
||||
)
|
||||
|
||||
const installedSet = useMemo(() => {
|
||||
const s = new Set()
|
||||
if (Array.isArray(installedNodeIds)) installedNodeIds.forEach(id => s.add(id))
|
||||
else if (installedNodeIds && typeof installedNodeIds.has === 'function') {
|
||||
installedNodeIds.forEach(id => s.add(id))
|
||||
}
|
||||
return s
|
||||
}, [installedNodeIds])
|
||||
|
||||
const filteredNodes = useMemo(() => {
|
||||
let list = backendNodes
|
||||
if (!showHealthy) list = list.filter(n => n.status !== 'healthy')
|
||||
if (!showDraining) list = list.filter(n => n.status !== 'draining')
|
||||
if (search.trim()) {
|
||||
const q = search.toLowerCase()
|
||||
list = list.filter(n =>
|
||||
(n.name || '').toLowerCase().includes(q) ||
|
||||
Object.entries(n.labels || {}).some(([k, v]) => `${k}=${v}`.toLowerCase().includes(q))
|
||||
)
|
||||
}
|
||||
return list
|
||||
}, [backendNodes, showHealthy, showDraining, search])
|
||||
|
||||
// Pre-seed selection on open. Reset all transient state so reopening
|
||||
// doesn't surface ghost progress from the prior submit.
|
||||
useEffect(() => {
|
||||
if (!open) return
|
||||
const initial = new Set()
|
||||
if (Array.isArray(initialSelection)) initialSelection.forEach(id => initial.add(id))
|
||||
setSelected(initial)
|
||||
setSearch('')
|
||||
setOverrideVariant('')
|
||||
setOverrideExpanded(false)
|
||||
setPerNode({})
|
||||
setSubmitting(false)
|
||||
setShowMismatchConfirm(false)
|
||||
}, [open, initialSelection])
|
||||
|
||||
// Auto-expand the variant override disclosure when at least one selected
|
||||
// node lacks a working GPU. This is the headline use case the feature
|
||||
// exists for; surfacing it instead of hiding behind a click.
|
||||
useEffect(() => {
|
||||
if (!backend?.isMeta) return
|
||||
const someGPUMissing = Array.from(selected).some(id => {
|
||||
const n = backendNodes.find(x => x.id === id)
|
||||
return n && (!n.gpu_vendor || n.gpu_vendor === '' || n.gpu_vendor === 'unknown')
|
||||
})
|
||||
if (someGPUMissing && !overrideExpanded) setOverrideExpanded(true)
|
||||
}, [selected, backend, backendNodes]) // eslint-disable-line react-hooks/exhaustive-deps
|
||||
|
||||
// The effective backend that gets installed on each node. For
|
||||
// hardware-specific backends this is just backend.name. For meta backends
|
||||
// with no override, the worker picks per-node — we pass backend.name and
|
||||
// the worker resolves. With an override set, the picker installs that
|
||||
// exact concrete variant on every selected node.
|
||||
const effectiveBackendName = overrideVariant || backend?.name
|
||||
|
||||
const counts = useMemo(() => {
|
||||
let already = 0, overrides = 0
|
||||
selected.forEach(id => {
|
||||
const n = backendNodes.find(x => x.id === id)
|
||||
if (!n) return
|
||||
if (installedSet.has(id)) { already++; return }
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
const s = suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false })
|
||||
if (s === 'override') overrides++
|
||||
})
|
||||
return { already, overrides, selected: selected.size }
|
||||
}, [selected, backendNodes, installedSet, overrideVariant, backend, hardwareTarget, allBackends])
|
||||
|
||||
const toggle = (nodeId) => {
|
||||
setSelected(prev => {
|
||||
const next = new Set(prev)
|
||||
next.has(nodeId) ? next.delete(nodeId) : next.add(nodeId)
|
||||
return next
|
||||
})
|
||||
}
|
||||
|
||||
const selectAllHealthy = () => {
|
||||
setSelected(new Set(filteredNodes.filter(n => n.status === 'healthy').map(n => n.id)))
|
||||
}
|
||||
const selectCompatible = () => {
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
setSelected(new Set(
|
||||
filteredNodes
|
||||
.filter(n => suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false }) === 'compatible')
|
||||
.map(n => n.id)
|
||||
))
|
||||
}
|
||||
const clearSelection = () => setSelected(new Set())
|
||||
|
||||
const submit = async () => {
|
||||
if (selected.size === 0 || submitting) return
|
||||
if (counts.overrides > 0 && !showMismatchConfirm) {
|
||||
setShowMismatchConfirm(true)
|
||||
return
|
||||
}
|
||||
setShowMismatchConfirm(false)
|
||||
setSubmitting(true)
|
||||
const ids = Array.from(selected)
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
ids.forEach(id => { next[id] = { status: 'installing' } })
|
||||
return next
|
||||
})
|
||||
|
||||
const results = await Promise.allSettled(ids.map(id =>
|
||||
nodesApi.installBackend(id, effectiveBackendName)
|
||||
.then(r => ({ id, ok: true, message: r?.message }))
|
||||
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
|
||||
))
|
||||
|
||||
let successCount = 0, failCount = 0
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
for (const r of results) {
|
||||
if (r.status !== 'fulfilled') continue
|
||||
const v = r.value
|
||||
if (v.ok) {
|
||||
next[v.id] = { status: 'done' }
|
||||
successCount++
|
||||
} else {
|
||||
next[v.id] = { status: 'error', error: v.error }
|
||||
failCount++
|
||||
}
|
||||
}
|
||||
return next
|
||||
})
|
||||
setSubmitting(false)
|
||||
|
||||
if (successCount > 0 && onComplete) onComplete()
|
||||
|
||||
if (failCount === 0) {
|
||||
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
|
||||
setTimeout(() => onClose?.(), 800)
|
||||
} else if (successCount === 0) {
|
||||
addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
|
||||
} else {
|
||||
addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
|
||||
}
|
||||
}
|
||||
|
||||
const retryFailed = async () => {
|
||||
const failedIds = Object.entries(perNode)
|
||||
.filter(([, v]) => v.status === 'error')
|
||||
.map(([id]) => id)
|
||||
if (failedIds.length === 0) return
|
||||
setSelected(new Set(failedIds))
|
||||
// Replace state for failed rows so they show "installing" again, not stale errors.
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
failedIds.forEach(id => { next[id] = { status: 'installing' } })
|
||||
return next
|
||||
})
|
||||
setSubmitting(true)
|
||||
const results = await Promise.allSettled(failedIds.map(id =>
|
||||
nodesApi.installBackend(id, effectiveBackendName)
|
||||
.then(r => ({ id, ok: true, message: r?.message }))
|
||||
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
|
||||
))
|
||||
let successCount = 0, failCount = 0
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
for (const r of results) {
|
||||
if (r.status !== 'fulfilled') continue
|
||||
const v = r.value
|
||||
if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
|
||||
else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
|
||||
}
|
||||
return next
|
||||
})
|
||||
setSubmitting(false)
|
||||
if (successCount > 0 && onComplete) onComplete()
|
||||
if (failCount === 0) {
|
||||
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
|
||||
setTimeout(() => onClose?.(), 800)
|
||||
}
|
||||
}
|
||||
|
||||
const doneCount = Object.values(perNode).filter(v => v.status === 'done').length
|
||||
const errorCount = Object.values(perNode).filter(v => v.status === 'error').length
|
||||
const totalAttempted = Object.keys(perNode).length
|
||||
|
||||
if (!open || !backend) return null
|
||||
|
||||
const noNodes = backendNodes.length === 0
|
||||
|
||||
return (
|
||||
<Modal onClose={onClose} maxWidth="780px">
|
||||
<div style={{
|
||||
padding: 'var(--spacing-md) var(--spacing-lg)',
|
||||
borderBottom: '1px solid var(--color-border-subtle)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
justifyContent: 'space-between',
|
||||
gap: 'var(--spacing-sm)',
|
||||
}}>
|
||||
<h2 style={{ margin: 0, fontSize: '1rem', display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
|
||||
<i className="fas fa-cog" style={{ color: 'var(--color-primary)' }} />
|
||||
Install <span style={{ fontFamily: 'var(--font-mono)' }}>{backend.name}</span>
|
||||
{backend.isMeta ? (
|
||||
<span className="badge badge-info" style={{ fontSize: '0.6875rem' }}>Auto-resolving</span>
|
||||
) : (
|
||||
<span className="badge badge-warning" style={{ fontSize: '0.6875rem' }}>Hardware-specific</span>
|
||||
)}
|
||||
</h2>
|
||||
<button
|
||||
type="button"
|
||||
className="btn btn-ghost btn-sm"
|
||||
onClick={onClose}
|
||||
aria-label="Close"
|
||||
style={{ fontSize: '1.125rem', lineHeight: 1, padding: '4px 10px' }}
|
||||
>×</button>
|
||||
</div>
|
||||
|
||||
<div style={{ padding: 'var(--spacing-md) var(--spacing-lg)' }}>
|
||||
{!backend.isMeta && (
|
||||
<div className="card" style={{
|
||||
marginBottom: 'var(--spacing-md)',
|
||||
padding: 'var(--spacing-sm) var(--spacing-md)',
|
||||
background: 'var(--color-warning-light)',
|
||||
border: '1px solid var(--color-warning-border)',
|
||||
borderRadius: 'var(--radius-md)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
gap: 'var(--spacing-sm)',
|
||||
}}>
|
||||
<i className="fas fa-microchip" style={{ color: 'var(--color-warning)' }} />
|
||||
<span style={{ color: 'var(--color-warning)', fontSize: '0.8125rem' }}>
|
||||
{targetLabel}. Install only on nodes where you want this build to run.
|
||||
{hardwareTarget && ` Targets: ${humanTargetLabel(hardwareTarget).replace(' build', '')}.`}
|
||||
</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{noNodes ? (
|
||||
<div className="empty-state" style={{ padding: 'var(--spacing-xl) 0' }}>
|
||||
<div className="empty-state-icon"><i className="fas fa-server" /></div>
|
||||
<h3 className="empty-state-title">No backend nodes available</h3>
|
||||
<p className="empty-state-text">
|
||||
Approve pending workers or register new ones.
|
||||
{pendingCount > 0 && ` (${pendingCount} awaiting approval.)`}
|
||||
</p>
|
||||
<a className="btn btn-secondary btn-sm" href="/app/nodes">
|
||||
<i className="fas fa-network-wired" /> Manage nodes
|
||||
</a>
|
||||
</div>
|
||||
) : (
|
||||
<>
|
||||
{/* Filter row */}
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', alignItems: 'center', marginBottom: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
|
||||
<div className="search-bar" style={{ flex: 1, minWidth: 180 }}>
|
||||
<i className="fas fa-search search-icon" />
|
||||
<input
|
||||
ref={headerInputRef}
|
||||
className="input"
|
||||
placeholder="Filter nodes by name or label..."
|
||||
value={search}
|
||||
onChange={e => setSearch(e.target.value)}
|
||||
/>
|
||||
</div>
|
||||
<button className="btn btn-secondary btn-sm" onClick={selectAllHealthy} type="button">
|
||||
Select all healthy
|
||||
</button>
|
||||
<button className="btn btn-secondary btn-sm" onClick={selectCompatible} type="button">
|
||||
Select compatible nodes
|
||||
</button>
|
||||
{selected.size > 0 && (
|
||||
<button className="btn btn-ghost btn-sm" onClick={clearSelection} type="button">
|
||||
Clear
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Variant override (auto-resolving only) */}
|
||||
{backend.isMeta && concreteVariants.length > 0 && (
|
||||
<div style={{ marginBottom: 'var(--spacing-sm)' }}>
|
||||
<button
|
||||
type="button"
|
||||
className="btn btn-ghost btn-sm"
|
||||
onClick={() => setOverrideExpanded(v => !v)}
|
||||
aria-expanded={overrideExpanded}
|
||||
style={{ padding: '4px 8px' }}
|
||||
>
|
||||
<i className={`fas fa-chevron-${overrideExpanded ? 'down' : 'right'}`} style={{ marginRight: 4, fontSize: '0.625rem' }} />
|
||||
Override variant for selected nodes…
|
||||
</button>
|
||||
{overrideExpanded && (
|
||||
<div className="card" style={{ marginTop: 4, padding: 'var(--spacing-sm) var(--spacing-md)' }}>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 0, marginBottom: 'var(--spacing-xs)' }}>
|
||||
By default each node picks its own variant. Override to install one specific variant on every selected node — useful when GPU detection fails on a node and you want the CPU build there instead.
|
||||
</p>
|
||||
<SearchableSelect
|
||||
value={overrideVariant}
|
||||
onChange={setOverrideVariant}
|
||||
options={concreteVariants}
|
||||
placeholder="Per-node auto-resolve (default)"
|
||||
allOption={{ value: '', label: 'Per-node auto-resolve (default)' }}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Node table */}
|
||||
<div className="table-container" style={{ marginBottom: 'var(--spacing-sm)', maxHeight: '40vh', overflowY: 'auto' }}>
|
||||
<table className="table" style={{ margin: 0 }}>
|
||||
<thead>
|
||||
<tr>
|
||||
<th style={{ width: 28 }}>
|
||||
<input
|
||||
type="checkbox"
|
||||
aria-label="Select all visible"
|
||||
checked={filteredNodes.length > 0 && filteredNodes.every(n => selected.has(n.id))}
|
||||
onChange={(e) => {
|
||||
setSelected(prev => {
|
||||
const next = new Set(prev)
|
||||
if (e.target.checked) filteredNodes.forEach(n => next.add(n.id))
|
||||
else filteredNodes.forEach(n => next.delete(n.id))
|
||||
return next
|
||||
})
|
||||
}}
|
||||
/>
|
||||
</th>
|
||||
<th>Node</th>
|
||||
<th>Status</th>
|
||||
<th>Hardware</th>
|
||||
<th>Suitability</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{filteredNodes.map(node => {
|
||||
const installed = installedSet.has(node.id)
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
const suit = suitabilityFor({ node, backend: eff, hardwareTarget: target, alreadyInstalled: installed })
|
||||
const isSel = selected.has(node.id)
|
||||
const rowState = perNode[node.id]
|
||||
const vendor = gpuVendorLabel(node.gpu_vendor)
|
||||
const totalVRAM = formatVRAM(node.total_vram)
|
||||
const totalRAM = formatVRAM(node.total_ram)
|
||||
return (
|
||||
<tr key={node.id}>
|
||||
<td>
|
||||
<input
|
||||
type="checkbox"
|
||||
aria-label={`Select ${node.name}`}
|
||||
aria-disabled={rowState?.status === 'installing'}
|
||||
checked={isSel}
|
||||
onChange={() => toggle(node.id)}
|
||||
/>
|
||||
</td>
|
||||
<td>
|
||||
<div style={{ display: 'flex', flexDirection: 'column', gap: 2 }}>
|
||||
<span style={{ fontWeight: 500, fontSize: '0.875rem' }}>{node.name}</span>
|
||||
{node.labels && Object.keys(node.labels).length > 0 && (
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 3 }}>
|
||||
{Object.entries(node.labels).slice(0, 3).map(([k, v]) => (
|
||||
<span key={k} className="cell-mono" style={{
|
||||
padding: '1px 5px', borderRadius: 'var(--radius-sm)', fontSize: '0.6875rem',
|
||||
background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
|
||||
}}>{k}={v}</span>
|
||||
))}
|
||||
{Object.keys(node.labels).length > 3 && (
|
||||
<span className="cell-muted" style={{ fontSize: '0.6875rem' }}>
|
||||
+{Object.keys(node.labels).length - 3}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<span style={{ fontSize: '0.8125rem' }}>
|
||||
{STATUS_LABELS[node.status] || node.status}
|
||||
</span>
|
||||
</td>
|
||||
<td style={{ fontSize: '0.8125rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)' }}>
|
||||
{totalVRAM ? (
|
||||
<>{vendor && <span style={{ marginRight: 4 }}>{vendor}</span>}{totalVRAM}</>
|
||||
) : totalRAM ? (
|
||||
<span>CPU · {totalRAM}</span>
|
||||
) : <span className="cell-muted">—</span>}
|
||||
</td>
|
||||
<td>
|
||||
{rowState?.status === 'installing' ? (
|
||||
<span className="badge badge-info">
|
||||
<i className="fas fa-spinner fa-spin" style={{ marginRight: 4 }} />Installing
|
||||
</span>
|
||||
) : rowState?.status === 'done' ? (
|
||||
<span className="badge badge-success">
|
||||
<i className="fas fa-check" style={{ marginRight: 4 }} />Installed
|
||||
</span>
|
||||
) : rowState?.status === 'error' ? (
|
||||
<button
|
||||
type="button"
|
||||
className="badge badge-error"
|
||||
title={rowState.error}
|
||||
aria-describedby={`err-${node.id}`}
|
||||
style={{ border: 'none', cursor: 'help' }}
|
||||
>
|
||||
<i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />Failed
|
||||
<span id={`err-${node.id}`} style={{ position: 'absolute', left: -9999 }}>{rowState.error}</span>
|
||||
</button>
|
||||
) : suit === 'installed' ? (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
Installed
|
||||
</span>
|
||||
) : suit === 'override' ? (
|
||||
<span className="badge badge-warning">
|
||||
<i className="fas fa-exclamation-circle" style={{ marginRight: 4 }} />Override
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge badge-success" style={{ background: 'var(--color-success-light)', color: 'var(--color-success)' }}>
|
||||
Compatible
|
||||
</span>
|
||||
)}
|
||||
</td>
|
||||
</tr>
|
||||
)
|
||||
})}
|
||||
{filteredNodes.length === 0 && (
|
||||
<tr>
|
||||
<td colSpan={5} style={{ textAlign: 'center', padding: 'var(--spacing-md)', color: 'var(--color-text-muted)' }}>
|
||||
No nodes match the current filters.
|
||||
</td>
|
||||
</tr>
|
||||
)}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{pendingCount > 0 && (
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 0, marginBottom: 'var(--spacing-sm)' }}>
|
||||
+{pendingCount} awaiting approval — <a href="/app/nodes" style={{ color: 'var(--color-primary)' }}>approve from Nodes</a>.
|
||||
</p>
|
||||
)}
|
||||
|
||||
{/* Mismatch confirm */}
|
||||
{showMismatchConfirm && (
|
||||
<div className="card" style={{
|
||||
marginBottom: 'var(--spacing-sm)',
|
||||
padding: 'var(--spacing-md)',
|
||||
background: 'var(--color-warning-light)',
|
||||
border: '1px solid var(--color-warning-border)',
|
||||
borderRadius: 'var(--radius-md)',
|
||||
}}>
|
||||
<p style={{ marginTop: 0, marginBottom: 'var(--spacing-sm)', color: 'var(--color-warning)', fontSize: '0.875rem' }}>
|
||||
Installing {targetLabel.toLowerCase()} on {counts.overrides} node{counts.overrides === 1 ? '' : 's'} that don't match. Those nodes will run inference on the chosen build, not their native GPU. Continue?
|
||||
</p>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end' }}>
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={() => setShowMismatchConfirm(false)}>
|
||||
Cancel
|
||||
</button>
|
||||
<button className="btn btn-primary btn-sm" type="button" onClick={submit}
|
||||
style={{ background: 'var(--color-warning)', borderColor: 'var(--color-warning)' }}>
|
||||
Install on {targetLabel.replace(' build', '')}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{!noNodes && (
|
||||
<div style={{
|
||||
padding: 'var(--spacing-md) var(--spacing-lg)',
|
||||
borderTop: '1px solid var(--color-border-subtle)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
gap: 'var(--spacing-sm)',
|
||||
flexWrap: 'wrap',
|
||||
}}>
|
||||
<div style={{ flex: 1, fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>
|
||||
{totalAttempted > 0 ? (
|
||||
<>
|
||||
{doneCount} of {totalAttempted} done
|
||||
{errorCount > 0 && (
|
||||
<> · <span className="badge badge-error" style={{ fontSize: '0.6875rem' }}>{errorCount} failed</span></>
|
||||
)}
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
{counts.selected} {counts.selected === 1 ? 'node' : 'nodes'} selected
|
||||
{counts.already > 0 && <> · {counts.already} already installed</>}
|
||||
{counts.overrides > 0 && <> · {counts.overrides} override{counts.overrides === 1 ? '' : 's'}</>}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
{errorCount > 0 && !submitting && (
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={retryFailed}>
|
||||
<i className="fas fa-redo" /> Retry failed nodes
|
||||
</button>
|
||||
)}
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={onClose} disabled={submitting}>
|
||||
{totalAttempted > 0 && doneCount > 0 ? 'Close' : 'Cancel'}
|
||||
</button>
|
||||
<button
|
||||
className="btn btn-primary btn-sm"
|
||||
type="button"
|
||||
onClick={submit}
|
||||
disabled={submitting || counts.selected === 0 || showMismatchConfirm}
|
||||
>
|
||||
{submitting ? (
|
||||
<><i className="fas fa-spinner fa-spin" /> Installing…</>
|
||||
) : (
|
||||
<>Install on {counts.selected} {counts.selected === 1 ? 'node' : 'nodes'}</>
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</Modal>
|
||||
)
|
||||
}
|
||||
29
core/http/react-ui/src/components/ResourceActions.jsx
Normal file
29
core/http/react-ui/src/components/ResourceActions.jsx
Normal file
@@ -0,0 +1,29 @@
|
||||
// ResourceActions groups row-level buttons into a lifecycle cluster (start,
|
||||
// stop, pin, reinstall, upgrade) and a destructive cluster (delete) with a
|
||||
// thin divider between them, so a destructive intent visually separates from
|
||||
// a routine one. Replaces the old 4px-gap row of buttons in the Manage page
|
||||
// where Stop / Pin / Delete sat shoulder-to-shoulder with no visual cue
|
||||
// telling apart "click to fiddle" from "click to throw away".
|
||||
//
|
||||
// `lifecycle` and `destructive` accept any ReactNode — typically one or more
|
||||
// <button>s. The wrapping div stops click propagation so action clicks don't
|
||||
// also expand the row.
|
||||
export default function ResourceActions({ lifecycle, destructive }) {
|
||||
const hasLifecycle = !!lifecycle
|
||||
const hasDestructive = !!destructive
|
||||
if (!hasLifecycle && !hasDestructive) return null
|
||||
|
||||
return (
|
||||
<div className="resource-actions" onClick={e => e.stopPropagation()}>
|
||||
{hasLifecycle && (
|
||||
<div className="resource-actions__group">{lifecycle}</div>
|
||||
)}
|
||||
{hasLifecycle && hasDestructive && (
|
||||
<span className="resource-actions__divider" aria-hidden="true" />
|
||||
)}
|
||||
{hasDestructive && (
|
||||
<div className="resource-actions__group">{destructive}</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -51,7 +51,7 @@ export default function ResourceMonitor() {
|
||||
<div className="resource-bar-container" style={{ flex: 1 }}>
|
||||
<div className="resource-bar" style={{ width: `${pct}%`, background: color }} />
|
||||
</div>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color, minWidth: '3em', textAlign: 'right' }}>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color, minWidth: '3em', textAlign: 'right' }}>
|
||||
{pct.toFixed(0)}%
|
||||
</span>
|
||||
</div>
|
||||
@@ -76,7 +76,7 @@ export default function ResourceMonitor() {
|
||||
<div className="resource-bar-container" style={{ flex: 1 }}>
|
||||
<div className="resource-bar" style={{ width: `${ram.usage_percent || 0}%`, background: percentColor(ram.usage_percent || 0) }} />
|
||||
</div>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
|
||||
{(ram.usage_percent || 0).toFixed(0)}%
|
||||
</span>
|
||||
</div>
|
||||
@@ -91,7 +91,7 @@ export default function ResourceMonitor() {
|
||||
{isGpu && aggregate.gpu_count > 1 && (
|
||||
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
|
||||
<span>Total VRAM</span>
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>
|
||||
<span style={{ fontFamily: 'var(--font-mono)' }}>
|
||||
{formatBytes(aggregate.used_memory)} / {formatBytes(aggregate.total_memory)} ({aggregate.usage_percent?.toFixed(1)}%)
|
||||
</span>
|
||||
</div>
|
||||
@@ -101,7 +101,7 @@ export default function ResourceMonitor() {
|
||||
{resources.storage_size != null && (
|
||||
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
|
||||
<span>Models storage</span>
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-primary)' }}>
|
||||
<span style={{ fontFamily: 'var(--font-mono)', color: 'var(--color-text-primary)' }}>
|
||||
{formatBytes(resources.storage_size)}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
81
core/http/react-ui/src/components/ResourceRow.jsx
Normal file
81
core/http/react-ui/src/components/ResourceRow.jsx
Normal file
@@ -0,0 +1,81 @@
|
||||
import { Fragment } from 'react'
|
||||
|
||||
// ResourceRow renders the visible row + its conditional detail row as a pair
|
||||
// of <tr>s, so the existing .table styling keeps applying and the Manage page
|
||||
// can re-use the gallery's expand-to-detail interaction without inventing a
|
||||
// new table system. The consumer owns the cells (which pass through as
|
||||
// children) — this component only manages the click-to-expand handler, the
|
||||
// dimmed state for disabled rows, and the colSpan'd detail row beneath.
|
||||
//
|
||||
// `onToggleExpand` fires on row click only. Buttons / toggles inside cells
|
||||
// must call e.stopPropagation() (or be wrapped in an .actions-stop wrapper)
|
||||
// to avoid double-triggering the expand.
|
||||
export default function ResourceRow({
|
||||
expanded,
|
||||
onToggleExpand,
|
||||
detail,
|
||||
colSpan,
|
||||
dimmed,
|
||||
className = '',
|
||||
children,
|
||||
}) {
|
||||
return (
|
||||
<Fragment>
|
||||
<tr
|
||||
className={`resource-row${dimmed ? ' is-dimmed' : ''}${expanded ? ' is-expanded' : ''} ${className}`.trim()}
|
||||
onClick={onToggleExpand}
|
||||
style={{ cursor: onToggleExpand ? 'pointer' : 'default' }}
|
||||
>
|
||||
{children}
|
||||
</tr>
|
||||
{expanded && detail && (
|
||||
<tr className="resource-row__detail-row">
|
||||
<td colSpan={colSpan} className="resource-row__detail-cell">
|
||||
{detail}
|
||||
</td>
|
||||
</tr>
|
||||
)}
|
||||
</Fragment>
|
||||
)
|
||||
}
|
||||
|
||||
// ChevronCell is the small rotating chevron used as the leftmost cell of an
|
||||
// expandable row. Mirrors the Nodes/Models/Backends gallery affordance so
|
||||
// users see the same "click to expand" cue everywhere.
|
||||
export function ChevronCell({ expanded }) {
|
||||
return (
|
||||
<td className="resource-row__chevron-cell">
|
||||
<span className={`row-chevron${expanded ? ' is-expanded' : ''}`} aria-hidden="true">
|
||||
<i className="fas fa-chevron-right" />
|
||||
</span>
|
||||
</td>
|
||||
)
|
||||
}
|
||||
|
||||
// IconCell renders the 48px brand icon shell — the same one the Install
|
||||
// gallery uses. `icon` is the image URL (from gallery metadata); when absent
|
||||
// or broken we fall back to a FontAwesome glyph so custom-imported items
|
||||
// still get a placeholder instead of an empty square.
|
||||
export function IconCell({ icon, fallback = 'fa-cube', alt = '' }) {
|
||||
return (
|
||||
<td className="resource-row__icon-cell">
|
||||
<div className="resource-row__icon">
|
||||
{icon ? (
|
||||
<img src={icon} alt={alt} loading="lazy" />
|
||||
) : (
|
||||
<i className={`fas ${fallback}`} />
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
)
|
||||
}
|
||||
|
||||
// StopPropagationCell wraps cell contents that contain interactive controls
|
||||
// (Toggle, action buttons) so a click on them doesn't also expand the row.
|
||||
export function StopPropagationCell({ children, ...props }) {
|
||||
return (
|
||||
<td {...props} onClick={e => e.stopPropagation()}>
|
||||
{children}
|
||||
</td>
|
||||
)
|
||||
}
|
||||
@@ -116,7 +116,7 @@ export default function SearchableSelect({
|
||||
aria-expanded={open}
|
||||
onClick={() => { if (!disabled) { setOpen(!open); setQuery(''); setFocusIndex(-1) } }}
|
||||
style={{
|
||||
width: '100%', padding: '4px 8px', fontSize: '0.8125rem',
|
||||
width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem',
|
||||
cursor: disabled ? 'not-allowed' : 'pointer',
|
||||
display: 'flex', alignItems: 'center', gap: '6px',
|
||||
background: 'var(--color-bg-primary)', border: '1px solid var(--color-border)',
|
||||
@@ -145,7 +145,7 @@ export default function SearchableSelect({
|
||||
value={query}
|
||||
onChange={(e) => { setQuery(e.target.value); setFocusIndex(-1) }}
|
||||
onKeyDown={handleKeyDown}
|
||||
style={{ width: '100%', padding: '4px 8px', fontSize: '0.8125rem' }}
|
||||
style={{ width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
</div>
|
||||
<div ref={listRef} role="listbox" style={{ overflowY: 'auto', maxHeight: 'min(200px, 50vh)' }}>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
import { useState, useEffect, useRef } from 'react'
|
||||
import { NavLink, useNavigate, useLocation } from 'react-router-dom'
|
||||
import ThemeToggle from './ThemeToggle'
|
||||
import { useAuth } from '../context/AuthContext'
|
||||
@@ -24,6 +24,18 @@ const sections = [
|
||||
{ path: '/app/quantize', icon: 'fas fa-compress', label: 'Quantize (Experimental)', feature: 'quantization' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'biometrics',
|
||||
title: 'Biometrics',
|
||||
featureMap: {
|
||||
'/app/face': 'face_recognition',
|
||||
'/app/voice': 'voice_recognition',
|
||||
},
|
||||
items: [
|
||||
{ path: '/app/face', icon: 'fas fa-face-smile', label: 'Face Recognition', feature: 'face_recognition' },
|
||||
{ path: '/app/voice', icon: 'fas fa-microphone-lines', label: 'Voice Recognition', feature: 'voice_recognition' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'agents',
|
||||
title: 'Agents',
|
||||
@@ -95,11 +107,22 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
|
||||
const navigate = useNavigate()
|
||||
const location = useLocation()
|
||||
const closeBtnRef = useRef(null)
|
||||
|
||||
useEffect(() => {
|
||||
fetch(apiUrl('/api/features')).then(r => r.json()).then(setFeatures).catch(() => {})
|
||||
}, [])
|
||||
|
||||
// Move focus into the drawer when opened on mobile/tablet so keyboard
|
||||
// and screen-reader users land inside the dialog. Targeting the close
|
||||
// button avoids hijacking the visual focus to a nav item the user may
|
||||
// not have meant to activate.
|
||||
useEffect(() => {
|
||||
if (!isOpen) return
|
||||
const id = window.requestAnimationFrame(() => closeBtnRef.current?.focus())
|
||||
return () => window.cancelAnimationFrame(id)
|
||||
}, [isOpen])
|
||||
|
||||
// Auto-expand section containing the active route
|
||||
useEffect(() => {
|
||||
for (const section of sections) {
|
||||
@@ -156,7 +179,11 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
<>
|
||||
{isOpen && <div className="sidebar-overlay" onClick={onClose} />}
|
||||
|
||||
<aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
|
||||
<aside
|
||||
id="app-sidebar"
|
||||
className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}
|
||||
aria-label="Primary navigation"
|
||||
>
|
||||
{/* Logo */}
|
||||
<div className="sidebar-header">
|
||||
<a href="./" className="sidebar-logo-link">
|
||||
@@ -165,8 +192,13 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
<a href="./" className="sidebar-logo-icon" title="LocalAI">
|
||||
<img src={apiUrl('/static/logo.png')} alt="LocalAI" className="sidebar-logo-icon-img" />
|
||||
</a>
|
||||
<button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
|
||||
<i className="fas fa-times" />
|
||||
<button
|
||||
ref={closeBtnRef}
|
||||
className="sidebar-close-btn"
|
||||
onClick={onClose}
|
||||
aria-label="Close menu"
|
||||
>
|
||||
<i className="fas fa-times" aria-hidden="true" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
|
||||
39
core/http/react-ui/src/components/StatCard.jsx
Normal file
39
core/http/react-ui/src/components/StatCard.jsx
Normal file
@@ -0,0 +1,39 @@
|
||||
// StatCard renders a single cluster/dashboard metric card. The left accent
|
||||
// bar + icon chip color is driven by `accentVar` (a CSS custom property name,
|
||||
// e.g. "--color-success") so the card reads as semantic without the caller
|
||||
// having to reach into colors directly. `onClick` upgrades the card to a
|
||||
// keyboard-focusable button — used by the Manage page so cards double as
|
||||
// shortcuts to the relevant tab + filter.
|
||||
export default function StatCard({ icon, label, value, color, accentVar, onClick }) {
|
||||
const accent = color || (accentVar ? `var(${accentVar})` : 'var(--color-text-primary)')
|
||||
const interactive = typeof onClick === 'function'
|
||||
|
||||
const handleKeyDown = interactive
|
||||
? (e) => {
|
||||
if (e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
onClick(e)
|
||||
}
|
||||
}
|
||||
: undefined
|
||||
|
||||
return (
|
||||
<div
|
||||
className="stat-card"
|
||||
data-clickable={interactive ? 'true' : undefined}
|
||||
role={interactive ? 'button' : undefined}
|
||||
tabIndex={interactive ? 0 : undefined}
|
||||
onClick={interactive ? onClick : undefined}
|
||||
onKeyDown={handleKeyDown}
|
||||
style={accentVar ? { ['--stat-accent']: `var(${accentVar})` } : undefined}
|
||||
>
|
||||
<div className="stat-card__body">
|
||||
<div className="stat-card__label">{label}</div>
|
||||
<div className="stat-card__value" style={{ color: accent }}>{value}</div>
|
||||
</div>
|
||||
<div className="stat-card__icon" style={accentVar ? { color: accent } : undefined}>
|
||||
<i className={icon} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -24,7 +24,7 @@ export default function TemplateSelector({ onSelect }) {
|
||||
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)', lineHeight: 1.5, margin: 0 }}>
|
||||
{t.description}
|
||||
</p>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: '4px', marginTop: 'var(--spacing-xs)' }}>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)', marginTop: 'var(--spacing-xs)' }}>
|
||||
{Object.keys(t.fields).filter(k => k !== 'name').map(k => (
|
||||
<span key={k} className="badge" style={{
|
||||
fontSize: '0.6875rem', background: 'var(--color-bg-tertiary)',
|
||||
|
||||
@@ -187,7 +187,7 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
|
||||
value={url}
|
||||
onChange={e => setUrl(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
@@ -195,7 +195,7 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Name (optional)"
|
||||
value={name}
|
||||
onChange={e => setName(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="password"
|
||||
@@ -203,13 +203,13 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Auth token (optional)"
|
||||
value={authToken}
|
||||
onChange={e => setAuthToken(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
|
||||
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
|
||||
Use CORS proxy
|
||||
</label>
|
||||
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
|
||||
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
|
||||
<button type="button" className="btn btn-sm btn-primary" onClick={handleAddClient} disabled={!url.trim()}>Add</button>
|
||||
</div>
|
||||
|
||||
@@ -0,0 +1,63 @@
|
||||
import { useEffect, useRef, useState } from 'react'
|
||||
|
||||
// BoundingBoxCanvas — overlay face-detection rectangles on the user-supplied image.
|
||||
// boxes: [{ x, y, w, h, label?, sublabel?, tone? }]
|
||||
// tone: 'default' | 'success' | 'warning' | 'error' | 'accent'
|
||||
export default function BoundingBoxCanvas({ src, boxes = [], alt = '' }) {
|
||||
const wrapRef = useRef(null)
|
||||
const imgRef = useRef(null)
|
||||
const [dims, setDims] = useState({ w: 0, h: 0, natW: 0, natH: 0 })
|
||||
|
||||
useEffect(() => {
|
||||
const update = () => {
|
||||
if (!wrapRef.current || !imgRef.current) return
|
||||
const rect = imgRef.current.getBoundingClientRect()
|
||||
setDims({
|
||||
w: rect.width,
|
||||
h: rect.height,
|
||||
natW: imgRef.current.naturalWidth || 1,
|
||||
natH: imgRef.current.naturalHeight || 1,
|
||||
})
|
||||
}
|
||||
update()
|
||||
const ro = new ResizeObserver(update)
|
||||
if (imgRef.current) ro.observe(imgRef.current)
|
||||
window.addEventListener('resize', update)
|
||||
return () => {
|
||||
ro.disconnect()
|
||||
window.removeEventListener('resize', update)
|
||||
}
|
||||
}, [src])
|
||||
|
||||
const sx = dims.natW ? dims.w / dims.natW : 1
|
||||
const sy = dims.natH ? dims.h / dims.natH : 1
|
||||
|
||||
return (
|
||||
<div ref={wrapRef} className="biometrics-bbox">
|
||||
{src && <img ref={imgRef} src={src} alt={alt} onLoad={(e) => {
|
||||
setDims({
|
||||
w: e.target.getBoundingClientRect().width,
|
||||
h: e.target.getBoundingClientRect().height,
|
||||
natW: e.target.naturalWidth,
|
||||
natH: e.target.naturalHeight,
|
||||
})
|
||||
}} />}
|
||||
{boxes.map((b, i) => (
|
||||
<div key={i} className={`biometrics-bbox__box tone-${b.tone || 'accent'}`}
|
||||
style={{
|
||||
left: `${b.x * sx}px`,
|
||||
top: `${b.y * sy}px`,
|
||||
width: `${b.w * sx}px`,
|
||||
height: `${b.h * sy}px`,
|
||||
}}>
|
||||
{(b.label || b.sublabel) && (
|
||||
<div className="biometrics-bbox__tag">
|
||||
{b.label && <strong>{b.label}</strong>}
|
||||
{b.sublabel && <span>{b.sublabel}</span>}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
// DistributionBars — one horizontal bar per label, width proportional to value.
|
||||
// distribution: Record<string, number> (values are probabilities 0..1 or any positive scale).
|
||||
// dominant: string — highlighted row.
|
||||
export default function DistributionBars({ title, distribution, dominant, icon }) {
|
||||
if (!distribution || Object.keys(distribution).length === 0) return null
|
||||
const entries = Object.entries(distribution).sort((a, b) => b[1] - a[1])
|
||||
const max = entries.reduce((m, [, v]) => Math.max(m, v), 0) || 1
|
||||
|
||||
return (
|
||||
<div className="biometrics-dist card">
|
||||
<div className="biometrics-dist__head">
|
||||
{icon && <i className={icon} aria-hidden="true" />}
|
||||
<h3>{title}</h3>
|
||||
{dominant && <span className="biometrics-dist__dominant">{dominant}</span>}
|
||||
</div>
|
||||
<ul className="biometrics-dist__rows">
|
||||
{entries.map(([label, value]) => {
|
||||
const pct = (value / max) * 100
|
||||
const isDominant = label === dominant
|
||||
return (
|
||||
<li key={label} className={`biometrics-dist__row ${isDominant ? 'dominant' : ''}`}>
|
||||
<span className="biometrics-dist__label">{label}</span>
|
||||
<div className="biometrics-dist__bar-wrap" aria-hidden="true">
|
||||
<div className="biometrics-dist__bar" style={{ width: `${pct}%` }} />
|
||||
</div>
|
||||
<span className="biometrics-dist__value">{(value * 100).toFixed(1)}%</span>
|
||||
</li>
|
||||
)
|
||||
})}
|
||||
</ul>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -0,0 +1,89 @@
|
||||
import { useMemo, useRef, useEffect, useState } from 'react'
|
||||
|
||||
// EmbeddingInspector — compact visualization of a raw vector returned by /v1/face|voice/embed.
|
||||
// embedding: number[] (can be large). dim: int. model: string.
|
||||
export default function EmbeddingInspector({ embedding, dim, model, elapsedMs }) {
|
||||
const canvasRef = useRef(null)
|
||||
const [copied, setCopied] = useState(false)
|
||||
|
||||
const summary = useMemo(() => {
|
||||
if (!embedding || !embedding.length) return null
|
||||
let sum = 0, sumSq = 0, min = Infinity, max = -Infinity
|
||||
for (const v of embedding) {
|
||||
sum += v
|
||||
sumSq += v * v
|
||||
if (v < min) min = v
|
||||
if (v > max) max = v
|
||||
}
|
||||
const mean = sum / embedding.length
|
||||
const norm = Math.sqrt(sumSq)
|
||||
return { mean, norm, min, max }
|
||||
}, [embedding])
|
||||
|
||||
useEffect(() => {
|
||||
if (!canvasRef.current || !embedding?.length) return
|
||||
const canvas = canvasRef.current
|
||||
const dpr = window.devicePixelRatio || 1
|
||||
const cssW = canvas.clientWidth
|
||||
const cssH = 60
|
||||
canvas.width = Math.floor(cssW * dpr)
|
||||
canvas.height = Math.floor(cssH * dpr)
|
||||
const ctx = canvas.getContext('2d')
|
||||
ctx.scale(dpr, dpr)
|
||||
ctx.clearRect(0, 0, cssW, cssH)
|
||||
|
||||
const COUNT = Math.min(embedding.length, 128)
|
||||
const values = embedding.slice(0, COUNT)
|
||||
const max = Math.max(...values.map(Math.abs)) || 1
|
||||
const mid = cssH / 2
|
||||
const barW = cssW / COUNT
|
||||
const accent = getComputedStyle(canvas).getPropertyValue('--color-accent').trim() || '#e8a87c'
|
||||
const accentMuted = getComputedStyle(canvas).getPropertyValue('--color-text-muted').trim() || '#6c7084'
|
||||
ctx.strokeStyle = accentMuted
|
||||
ctx.beginPath()
|
||||
ctx.moveTo(0, mid + 0.5)
|
||||
ctx.lineTo(cssW, mid + 0.5)
|
||||
ctx.stroke()
|
||||
ctx.fillStyle = accent
|
||||
for (let i = 0; i < COUNT; i++) {
|
||||
const v = values[i]
|
||||
const h = (Math.abs(v) / max) * (cssH * 0.45)
|
||||
if (v >= 0) ctx.fillRect(i * barW, mid - h, Math.max(0.5, barW - 0.5), h)
|
||||
else ctx.fillRect(i * barW, mid, Math.max(0.5, barW - 0.5), h)
|
||||
}
|
||||
}, [embedding])
|
||||
|
||||
if (!embedding) return null
|
||||
|
||||
const copy = async () => {
|
||||
try {
|
||||
await navigator.clipboard.writeText(JSON.stringify(embedding))
|
||||
setCopied(true)
|
||||
setTimeout(() => setCopied(false), 1500)
|
||||
} catch (_) {
|
||||
/* clipboard gated */
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="biometrics-embed card">
|
||||
<div className="biometrics-embed__head">
|
||||
<div>
|
||||
<div className="biometrics-embed__title">Embedding vector</div>
|
||||
<div className="biometrics-embed__meta">
|
||||
{dim != null && <span><strong>{dim}</strong> dims</span>}
|
||||
{summary && <span>L2 <strong>{summary.norm.toFixed(3)}</strong></span>}
|
||||
{summary && <span>range <strong>[{summary.min.toFixed(3)}, {summary.max.toFixed(3)}]</strong></span>}
|
||||
{model && <span>model <code>{model}</code></span>}
|
||||
{elapsedMs != null && <span>{elapsedMs.toFixed(0)} ms</span>}
|
||||
</div>
|
||||
</div>
|
||||
<button type="button" className="btn btn-secondary btn-sm" onClick={copy}>
|
||||
<i className={`fas ${copied ? 'fa-check' : 'fa-copy'}`} aria-hidden="true" />
|
||||
{copied ? ' Copied' : ' Copy JSON'}
|
||||
</button>
|
||||
</div>
|
||||
<canvas ref={canvasRef} style={{ width: '100%', height: 60 }} aria-label="Embedding sparkline (first 128 dimensions)" />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user