mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-20 06:35:41 -04:00
Compare commits
45 Commits
feat/buun-
...
feat/distr
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3280b9a287 | ||
|
|
375bf1929d | ||
|
|
9a7f5e68bd | ||
|
|
6b63b47f61 | ||
|
|
f4036fa83f | ||
|
|
3810fe1a1e | ||
|
|
bdfa5e934a | ||
|
|
deca6dbdad | ||
|
|
60549a8a60 | ||
|
|
54728e292f | ||
|
|
86fd62233f | ||
|
|
41ed8ced70 | ||
|
|
05e94bd9e7 | ||
|
|
8d124d080f | ||
|
|
2da1a4d230 | ||
|
|
988430c850 | ||
|
|
b336d9c626 | ||
|
|
f384c64a91 | ||
|
|
e9d8e92988 | ||
|
|
5b0196c7d0 | ||
|
|
c8d63a1003 | ||
|
|
d9cb0d6133 | ||
|
|
f5c268deac | ||
|
|
8931a2ad31 | ||
|
|
e16e758dff | ||
|
|
1c45227346 | ||
|
|
fbe4f0a99b | ||
|
|
d733c9cd13 | ||
|
|
703b4fcae8 | ||
|
|
73aacad2f9 | ||
|
|
806ea24ff4 | ||
|
|
385de3705e | ||
|
|
21eace40ec | ||
|
|
24505e57f5 | ||
|
|
d09706dc60 | ||
|
|
08e393f7db | ||
|
|
47cc3dc8d7 | ||
|
|
83b384de97 | ||
|
|
487e3fd2a4 | ||
|
|
9ab3496de2 | ||
|
|
c4511be33a | ||
|
|
551ebdb57a | ||
|
|
1d0de757c3 | ||
|
|
e5337039b0 | ||
|
|
1c9592c77f |
@@ -43,7 +43,7 @@ If you add a new language bucket, `scripts/changed-backends.js` also needs a bra
|
||||
|
||||
**Additional build types you may need:**
|
||||
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
|
||||
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"`
|
||||
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
|
||||
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
|
||||
|
||||
## 3. Add Backend Metadata to `backend/index.yaml`
|
||||
|
||||
@@ -35,33 +35,19 @@ All contributions must comply with LocalAI's licensing requirements:
|
||||
|
||||
## Signed-off-by and Developer Certificate of Origin
|
||||
|
||||
Only humans can certify the Developer Certificate of Origin (DCO). AI
|
||||
agents MUST NOT invent or guess a human identity for `Signed-off-by` —
|
||||
doing so forges the DCO certification.
|
||||
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
|
||||
certify the Developer Certificate of Origin (DCO). The human submitter
|
||||
is responsible for:
|
||||
|
||||
However, when a human operator explicitly directs the AI to commit on
|
||||
their behalf, the AI is acting as a typing tool — no different from an
|
||||
editor macro or `git commit -s`. In that case the AI SHOULD add
|
||||
`Signed-off-by:` using the **configured `user.name` / `user.email`** of
|
||||
the current git repository (i.e. the operator's own identity). The
|
||||
resulting trailer is the operator's signature; they take responsibility
|
||||
for it by reviewing and pushing the commit. The AI MUST NOT use any
|
||||
other identity and MUST NOT add its own name to the sign-off.
|
||||
|
||||
When running `git commit`, prefer `git commit --signoff` (or `-s`) so
|
||||
the trailer is emitted by git itself from the configured identity,
|
||||
rather than hand-writing it in a heredoc — this guarantees the sign-off
|
||||
matches whatever identity the operator is currently using.
|
||||
|
||||
The human submitter remains responsible for:
|
||||
|
||||
- Reviewing all AI-generated code before it's pushed or merged
|
||||
- Reviewing all AI-generated code
|
||||
- Ensuring compliance with licensing requirements
|
||||
- Adding their own `Signed-off-by` tag (when the project requires DCO)
|
||||
to certify the contribution
|
||||
- Taking full responsibility for the contribution
|
||||
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves. A human
|
||||
reviewer owns the contribution; the AI's involvement is recorded via
|
||||
`Assisted-by` (see below).
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
|
||||
A human reviewer owns the contribution; the AI's involvement is recorded
|
||||
via `Assisted-by` (see below).
|
||||
|
||||
## Attribution
|
||||
|
||||
@@ -98,12 +84,6 @@ Assisted-by: Claude:claude-opus-4-7 golangci-lint
|
||||
Signed-off-by: Jane Developer <jane@example.com>
|
||||
```
|
||||
|
||||
The `Signed-off-by` line uses Jane's own identity because Jane is the
|
||||
submitter operating the AI. If Jane asks Claude to create the commit via
|
||||
`git commit -s`, git emits that exact trailer from Jane's configured
|
||||
identity — no separate human step is needed beyond Jane reviewing the
|
||||
diff before pushing.
|
||||
|
||||
## Scope and Responsibility
|
||||
|
||||
Using an AI assistant does not reduce the contributor's responsibility.
|
||||
|
||||
111
.agents/ci-caching.md
Normal file
111
.agents/ci-caching.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# CI Build Caching
|
||||
|
||||
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache. This file explains how that cache is laid out, what invalidates it, and how to bypass it.
|
||||
|
||||
## Cache layout
|
||||
|
||||
- **Cache registry**: `quay.io/go-skynet/ci-cache`
|
||||
- **One tag per matrix entry**, derived from the existing `tag-suffix`:
|
||||
- Backend builds (`backend_build.yml`): `cache<tag-suffix>`
|
||||
- e.g. `cache-gpu-nvidia-cuda-12-llama-cpp`, `cache-cpu-vllm`, `cache-nvidia-l4t-cuda-13-arm64-vllm`
|
||||
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>`
|
||||
- e.g. `cache-localai-gpu-nvidia-cuda-12`, `cache-localai-gpu-vulkan`
|
||||
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
|
||||
|
||||
## Read/write semantics
|
||||
|
||||
| Trigger | `cache-from` | `cache-to` |
|
||||
|---|---|---|
|
||||
| `push` to `master` / tag | yes | yes (`mode=max,ignore-error=true`) |
|
||||
| `pull_request` | yes | **no** |
|
||||
|
||||
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
|
||||
|
||||
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
|
||||
|
||||
## Self-warming, no separate populator
|
||||
|
||||
There is no cron job that pre-warms the cache. The production builds *are* the populator. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in `Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`, Python wheel installs, etc.).
|
||||
|
||||
Historically there was a `generate_grpc_cache.yaml` cron that targeted a `grpc` stage in the root Dockerfile. That stage was removed in July 2025 and the cron silently failed every night for 9 months without writing anything. It was deleted along with the registry-cache rollout.
|
||||
|
||||
## The `DEPS_REFRESH` cache-buster (Python backends)
|
||||
|
||||
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
|
||||
|
||||
```dockerfile
|
||||
ARG DEPS_REFRESH=initial
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
```
|
||||
|
||||
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
|
||||
|
||||
`DEPS_REFRESH` defends against that:
|
||||
|
||||
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W17`) before each build and passes it as a build-arg.
|
||||
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
|
||||
- Within a week, builds stay warm.
|
||||
|
||||
This applies only to `Dockerfile.python` because:
|
||||
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
|
||||
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
|
||||
- C++ backends (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) clone gRPC at a pinned tag (`v1.65.0`) and llama.cpp at a pinned commit; their inputs don't drift between rebuilds.
|
||||
|
||||
### Adjusting the cadence
|
||||
|
||||
If you need a faster refresh (e.g. while debugging an upstream flake), bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`). If you need a one-shot rebuild for a specific backend without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
|
||||
|
||||
## Manually evicting cache
|
||||
|
||||
To force a fully cold build for one backend or the whole image:
|
||||
|
||||
```bash
|
||||
# Delete a single tag (requires quay credentials with admin on the repo)
|
||||
curl -X DELETE \
|
||||
-H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm
|
||||
|
||||
# List all tags
|
||||
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||||
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
|
||||
```
|
||||
|
||||
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry so a stale tag never bleeds into a different build.
|
||||
|
||||
## What the cache **does not** cover
|
||||
|
||||
- The "Free Disk Space" / "Release space from worker" steps run on every job — these reclaim ~6 GB on `ubuntu-latest` runners. They are runner-state cleanup, not Docker, and BuildKit caches don't apply.
|
||||
- Intermediate artifacts of `Build and push (PR)` are not pushed anywhere — PRs only build for verification.
|
||||
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
|
||||
|
||||
## Darwin native caches
|
||||
|
||||
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
|
||||
|
||||
| Cache | Path(s) | Key | Scope |
|
||||
|---|---|---|---|
|
||||
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
|
||||
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
|
||||
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
|
||||
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
|
||||
|
||||
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
|
||||
|
||||
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
|
||||
|
||||
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
|
||||
|
||||
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
|
||||
|
||||
### Cache budget on Darwin
|
||||
|
||||
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
|
||||
|
||||
## Touching the cache pipeline
|
||||
|
||||
When changing `image_build.yml`, `backend_build.yml`, or any of the `backend/Dockerfile.*` files:
|
||||
|
||||
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
|
||||
2. **Keep `tag-suffix` unique per matrix entry** — it's the cache namespace. Two matrix entries sharing a tag-suffix would clobber each other's cache.
|
||||
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
|
||||
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
|
||||
178
.github/workflows/backend.yml
vendored
178
.github/workflows/backend.yml
vendored
@@ -141,7 +141,7 @@ jobs:
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-whisperx'
|
||||
runs-on: 'ubuntu-latest'
|
||||
@@ -154,7 +154,7 @@ jobs:
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-faster-whisper'
|
||||
runs-on: 'ubuntu-latest'
|
||||
@@ -399,19 +399,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "8"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-12-buun-llama-cpp'
|
||||
runs-on: 'bigger-runner'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "8"
|
||||
@@ -907,19 +894,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-buun-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
@@ -949,16 +923,29 @@ jobs:
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
skip-drivers: 'false'
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-buun-llama-cpp'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-vllm'
|
||||
runs-on: 'arc-runner-set'
|
||||
base-image: "ubuntu:24.04"
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
ubuntu-version: '2404'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
skip-drivers: 'false'
|
||||
backend: "vllm"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-nvidia-cuda-13-vllm-omni'
|
||||
runs-on: 'arc-runner-set'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "vllm-omni"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
@@ -1115,6 +1102,45 @@ jobs:
|
||||
backend: "diffusers"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "vllm"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-vllm-omni'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "vllm-omni"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-cuda-13-arm64-sglang'
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
ubuntu-version: '2404'
|
||||
backend: "sglang"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
- build-type: 'l4t'
|
||||
cuda-major-version: "13"
|
||||
cuda-minor-version: "0"
|
||||
@@ -1493,19 +1519,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-rocm-hipblas-buun-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -1723,7 +1736,7 @@ jobs:
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-rerankers'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "rerankers"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
@@ -1736,7 +1749,7 @@ jobs:
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-sycl-f32-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.llama-cpp"
|
||||
@@ -1755,19 +1768,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'sycl_f32'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-sycl-f32-buun-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'sycl_f16'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -1794,19 +1794,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'sycl_f16'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-intel-sycl-f16-buun-llama-cpp'
|
||||
runs-on: 'ubuntu-latest'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'intel'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -2212,19 +2199,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-cpu-buun-llama-cpp'
|
||||
runs-on: 'bigger-runner'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -2264,19 +2238,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2204'
|
||||
- build-type: 'cublas'
|
||||
cuda-major-version: "12"
|
||||
cuda-minor-version: "0"
|
||||
platforms: 'linux/arm64'
|
||||
skip-drivers: 'false'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-nvidia-l4t-arm64-buun-llama-cpp'
|
||||
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
|
||||
runs-on: 'ubuntu-24.04-arm'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2204'
|
||||
- build-type: 'vulkan'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
@@ -2303,19 +2264,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.turboquant"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'vulkan'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-vulkan-buun-llama-cpp'
|
||||
runs-on: 'bigger-runner'
|
||||
base-image: "ubuntu:24.04"
|
||||
skip-drivers: 'false'
|
||||
backend: "buun-llama-cpp"
|
||||
dockerfile: "./backend/Dockerfile.buun-llama-cpp"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
# Stablediffusion-ggml
|
||||
- build-type: ''
|
||||
cuda-major-version: ""
|
||||
|
||||
16
.github/workflows/backend_build.yml
vendored
16
.github/workflows/backend_build.yml
vendored
@@ -208,6 +208,15 @@ jobs:
|
||||
username: ${{ secrets.quayUsername }}
|
||||
password: ${{ secrets.quayPassword }}
|
||||
|
||||
# Weekly cache-buster for the per-backend `make` step. Most Python
|
||||
# backends list unpinned deps (torch, transformers, vllm, ...), so a
|
||||
# warm cache freezes upstream versions indefinitely. Rolling this
|
||||
# weekly forces a re-resolve of the install layer at most once per
|
||||
# week, picking up newer wheels without a full cold rebuild.
|
||||
- name: Compute deps refresh key
|
||||
id: deps_refresh
|
||||
run: echo "key=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build and push
|
||||
uses: docker/build-push-action@v7
|
||||
if: github.event_name != 'pull_request'
|
||||
@@ -222,9 +231,11 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }},mode=max,ignore-error=true
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ github.event_name != 'pull_request' }}
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
@@ -244,9 +255,10 @@ jobs:
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
DEPS_REFRESH=${{ steps.deps_refresh.outputs.key }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache${{ inputs.tag-suffix }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ env.quay_username != '' }}
|
||||
tags: ${{ steps.meta_pull_request.outputs.tags }}
|
||||
|
||||
131
.github/workflows/backend_build_darwin.yml
vendored
131
.github/workflows/backend_build_darwin.yml
vendored
@@ -48,6 +48,13 @@ jobs:
|
||||
strategy:
|
||||
matrix:
|
||||
go-version: ['${{ inputs.go-version }}']
|
||||
env:
|
||||
# Keep the brew Cellar stable across cache restores. Without these,
|
||||
# `brew install` would auto-update brew itself and re-link formulas,
|
||||
# mutating the very paths the cache just restored.
|
||||
HOMEBREW_NO_AUTO_UPDATE: '1'
|
||||
HOMEBREW_NO_INSTALL_CLEANUP: '1'
|
||||
HOMEBREW_NO_ANALYTICS: '1'
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
@@ -58,21 +65,141 @@ jobs:
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: ${{ matrix.go-version }}
|
||||
cache: false
|
||||
# Caches ~/go/pkg/mod and ~/Library/Caches/go-build keyed on go.sum.
|
||||
# Shared across every darwin matrix entry — first job in a run warms
|
||||
# it, the rest hit warm.
|
||||
cache: true
|
||||
|
||||
# You can test your matrix by printing the current Go version
|
||||
- name: Display Go version
|
||||
run: go version
|
||||
|
||||
# ---- Homebrew cache ----
|
||||
# macOS runners have no Docker daemon, so the BuildKit registry cache used
|
||||
# for Linux backend images (see .agents/ci-caching.md) doesn't apply here.
|
||||
# We cache the brew downloads + Cellar entries for the formulas we install
|
||||
# below. Read on every run, write only on master/tag pushes — same policy
|
||||
# as the Linux registry cache.
|
||||
- name: Restore Homebrew cache
|
||||
id: brew-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/Homebrew/downloads
|
||||
/opt/homebrew/Cellar/protobuf
|
||||
/opt/homebrew/Cellar/grpc
|
||||
/opt/homebrew/Cellar/protoc-gen-go
|
||||
/opt/homebrew/Cellar/protoc-gen-go-grpc
|
||||
/opt/homebrew/Cellar/libomp
|
||||
/opt/homebrew/Cellar/llvm
|
||||
/opt/homebrew/Cellar/ccache
|
||||
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
|
||||
|
||||
- name: Dependencies
|
||||
run: |
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm
|
||||
# ccache is always installed (used by the llama-cpp variant build) so
|
||||
# the brew cache content stays stable across every backend in the
|
||||
# matrix — they all share one cache key.
|
||||
brew install protobuf grpc make protoc-gen-go protoc-gen-go-grpc libomp llvm ccache
|
||||
|
||||
- name: Save Homebrew cache
|
||||
if: github.event_name != 'pull_request' && steps.brew-cache.outputs.cache-hit != 'true'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/Homebrew/downloads
|
||||
/opt/homebrew/Cellar/protobuf
|
||||
/opt/homebrew/Cellar/grpc
|
||||
/opt/homebrew/Cellar/protoc-gen-go
|
||||
/opt/homebrew/Cellar/protoc-gen-go-grpc
|
||||
/opt/homebrew/Cellar/libomp
|
||||
/opt/homebrew/Cellar/llvm
|
||||
/opt/homebrew/Cellar/ccache
|
||||
key: brew-${{ runner.os }}-${{ runner.arch }}-v1-${{ hashFiles('.github/workflows/backend_build_darwin.yml') }}
|
||||
|
||||
# ---- ccache for llama.cpp CMake builds ----
|
||||
# Three CMake variants (fallback, grpc, rpc-server) compile the same
|
||||
# llama.cpp source tree with overlapping flags — ccache dedupes object
|
||||
# files across them. Key on the pinned LLAMA_VERSION so a pin bump
|
||||
# invalidates cleanly; restore-keys fall back to the latest entry for the
|
||||
# same pin so unchanged TUs stay warm even when the cache is fresh.
|
||||
- name: Compute llama.cpp version
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
id: llama-version
|
||||
run: |
|
||||
version=$(grep '^LLAMA_VERSION' backend/cpp/llama-cpp/Makefile | head -1 | cut -d= -f2 | cut -d'?' -f1 | tr -d ' ')
|
||||
echo "version=${version}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Restore ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
id: ccache-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: ~/Library/Caches/ccache
|
||||
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
|
||||
restore-keys: |
|
||||
ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-
|
||||
|
||||
- name: Configure ccache
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
run: |
|
||||
mkdir -p "$HOME/Library/Caches/ccache"
|
||||
ccache -M 2G
|
||||
ccache -z
|
||||
# llama-cpp-darwin.sh reads CMAKE_ARGS / CCACHE_DIR from env.
|
||||
{
|
||||
echo "CMAKE_ARGS=${CMAKE_ARGS:-} -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
|
||||
echo "CCACHE_DIR=$HOME/Library/Caches/ccache"
|
||||
} >> "$GITHUB_ENV"
|
||||
|
||||
# ---- Python wheel cache (uv + pip) ----
|
||||
# Mirrors the Linux DEPS_REFRESH cadence (see .agents/ci-caching.md): the
|
||||
# ISO-week segment of the cache key forces at most one cold rebuild per
|
||||
# backend per week, automatically picking up newer wheels for unpinned
|
||||
# deps (torch, mlx, diffusers, …). Restore-keys fall back to the most
|
||||
# recent build of the same backend so off-week PRs still hit warm.
|
||||
- name: Compute weekly cache bucket
|
||||
if: inputs.lang == 'python'
|
||||
id: weekly
|
||||
run: echo "bucket=$(date -u +%Y-W%V)" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Restore Python wheel cache
|
||||
if: inputs.lang == 'python'
|
||||
id: pyenv-cache
|
||||
uses: actions/cache/restore@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/pip
|
||||
~/Library/Caches/uv
|
||||
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
|
||||
restore-keys: |
|
||||
pyenv-darwin-${{ inputs.backend }}-
|
||||
|
||||
- name: Build ${{ inputs.backend }}-darwin
|
||||
run: |
|
||||
make protogen-go
|
||||
BACKEND=${{ inputs.backend }} BUILD_TYPE=${{ inputs.build-type }} USE_PIP=${{ inputs.use-pip }} make build-darwin-${{ inputs.lang }}-backend
|
||||
|
||||
- name: ccache stats
|
||||
if: inputs.backend == 'llama-cpp'
|
||||
run: ccache -s
|
||||
|
||||
- name: Save ccache
|
||||
if: inputs.backend == 'llama-cpp' && github.event_name != 'pull_request'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: ~/Library/Caches/ccache
|
||||
key: ccache-llama-${{ runner.arch }}-${{ steps.llama-version.outputs.version }}-${{ github.run_id }}
|
||||
|
||||
- name: Save Python wheel cache
|
||||
if: inputs.lang == 'python' && github.event_name != 'pull_request' && steps.pyenv-cache.outputs.cache-hit != 'true'
|
||||
uses: actions/cache/save@v4
|
||||
with:
|
||||
path: |
|
||||
~/Library/Caches/pip
|
||||
~/Library/Caches/uv
|
||||
key: pyenv-darwin-${{ inputs.backend }}-${{ steps.weekly.outputs.bucket }}-${{ hashFiles(format('backend/python/{0}/requirements*.txt', inputs.backend)) }}
|
||||
|
||||
- name: Upload ${{ inputs.backend }}.tar
|
||||
uses: actions/upload-artifact@v7
|
||||
with:
|
||||
|
||||
2
.github/workflows/gallery-agent.yaml
vendored
2
.github/workflows/gallery-agent.yaml
vendored
@@ -2,7 +2,7 @@ name: Gallery Agent
|
||||
on:
|
||||
|
||||
schedule:
|
||||
- cron: '0 */3 * * *' # Run every 4 hours
|
||||
- cron: '0 */12 * * *' # Run every 4 hours
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
search_term:
|
||||
|
||||
96
.github/workflows/generate_grpc_cache.yaml
vendored
96
.github/workflows/generate_grpc_cache.yaml
vendored
@@ -1,96 +0,0 @@
|
||||
name: 'generate and publish GRPC docker caches'
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
schedule:
|
||||
# daily at midnight
|
||||
- cron: '0 0 * * *'
|
||||
|
||||
concurrency:
|
||||
group: grpc-cache-${{ github.head_ref || github.ref }}-${{ github.repository }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
generate_caches:
|
||||
if: github.repository == 'mudler/LocalAI'
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- grpc-base-image: ubuntu:24.04
|
||||
runs-on: 'ubuntu-latest'
|
||||
platforms: 'linux/amd64,linux/arm64'
|
||||
runs-on: ${{matrix.runs-on}}
|
||||
steps:
|
||||
- name: Release space from worker
|
||||
if: matrix.runs-on == 'ubuntu-latest'
|
||||
run: |
|
||||
echo "Listing top largest packages"
|
||||
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
|
||||
head -n 30 <<< "${pkgs}"
|
||||
echo
|
||||
df -h
|
||||
echo
|
||||
sudo apt-get remove -y '^llvm-.*|^libllvm.*' || true
|
||||
sudo apt-get remove --auto-remove android-sdk-platform-tools || true
|
||||
sudo apt-get purge --auto-remove android-sdk-platform-tools || true
|
||||
sudo rm -rf /usr/local/lib/android
|
||||
sudo apt-get remove -y '^dotnet-.*|^aspnetcore-.*' || true
|
||||
sudo rm -rf /usr/share/dotnet
|
||||
sudo apt-get remove -y '^mono-.*' || true
|
||||
sudo apt-get remove -y '^ghc-.*' || true
|
||||
sudo apt-get remove -y '.*jdk.*|.*jre.*' || true
|
||||
sudo apt-get remove -y 'php.*' || true
|
||||
sudo apt-get remove -y hhvm powershell firefox monodoc-manual msbuild || true
|
||||
sudo apt-get remove -y '^google-.*' || true
|
||||
sudo apt-get remove -y azure-cli || true
|
||||
sudo apt-get remove -y '^mongo.*-.*|^postgresql-.*|^mysql-.*|^mssql-.*' || true
|
||||
sudo apt-get remove -y '^gfortran-.*' || true
|
||||
sudo apt-get remove -y microsoft-edge-stable || true
|
||||
sudo apt-get remove -y firefox || true
|
||||
sudo apt-get remove -y powershell || true
|
||||
sudo apt-get remove -y r-base-core || true
|
||||
sudo apt-get autoremove -y
|
||||
sudo apt-get clean
|
||||
echo
|
||||
echo "Listing top largest packages"
|
||||
pkgs=$(dpkg-query -Wf '${Installed-Size}\t${Package}\t${Status}\n' | awk '$NF == "installed"{print $1 "\t" $2}' | sort -nr)
|
||||
head -n 30 <<< "${pkgs}"
|
||||
echo
|
||||
sudo rm -rfv build || true
|
||||
sudo rm -rf /usr/share/dotnet || true
|
||||
sudo rm -rf /opt/ghc || true
|
||||
sudo rm -rf "/usr/local/share/boost" || true
|
||||
sudo rm -rf "$AGENT_TOOLSDIRECTORY" || true
|
||||
df -h
|
||||
|
||||
- name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@master
|
||||
with:
|
||||
platforms: all
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
id: buildx
|
||||
uses: docker/setup-buildx-action@master
|
||||
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v6
|
||||
|
||||
- name: Cache GRPC
|
||||
uses: docker/build-push-action@v7
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
build-args: |
|
||||
GRPC_BASE_IMAGE=${{ matrix.grpc-base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-to: type=gha,ignore-error=true
|
||||
cache-from: type=gha
|
||||
target: grpc
|
||||
platforms: ${{ matrix.platforms }}
|
||||
push: false
|
||||
2
.github/workflows/generate_intel_image.yaml
vendored
2
.github/workflows/generate_intel_image.yaml
vendored
@@ -16,7 +16,7 @@ jobs:
|
||||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- base-image: intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04
|
||||
- base-image: intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04
|
||||
runs-on: 'arc-runner-set'
|
||||
platforms: 'linux/amd64'
|
||||
runs-on: ${{matrix.runs-on}}
|
||||
|
||||
5
.github/workflows/image-pr.yml
vendored
5
.github/workflows/image-pr.yml
vendored
@@ -20,7 +20,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
secrets:
|
||||
@@ -60,15 +59,13 @@
|
||||
tag-latest: 'false'
|
||||
tag-suffix: '-hipblas'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'sycl'
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'false'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
tag-suffix: 'sycl'
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
|
||||
9
.github/workflows/image.yml
vendored
9
.github/workflows/image.yml
vendored
@@ -25,7 +25,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
ubuntu-codename: ${{ matrix.ubuntu-codename }}
|
||||
@@ -42,12 +41,11 @@
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-hipblas'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
ubuntu-version: '2404'
|
||||
ubuntu-codename: 'noble'
|
||||
|
||||
|
||||
core-image-build:
|
||||
if: github.repository == 'mudler/LocalAI'
|
||||
uses: ./.github/workflows/image_build.yml
|
||||
@@ -60,7 +58,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
@@ -121,8 +118,7 @@
|
||||
- build-type: 'intel'
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
base-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04"
|
||||
grpc-base-image: "ubuntu:24.04"
|
||||
base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"
|
||||
tag-suffix: '-gpu-intel'
|
||||
runs-on: 'ubuntu-latest'
|
||||
makeflags: "--jobs=3 --output-sync=target"
|
||||
@@ -141,7 +137,6 @@
|
||||
platforms: ${{ matrix.platforms }}
|
||||
runs-on: ${{ matrix.runs-on }}
|
||||
base-image: ${{ matrix.base-image }}
|
||||
grpc-base-image: ${{ matrix.grpc-base-image }}
|
||||
makeflags: ${{ matrix.makeflags }}
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
|
||||
24
.github/workflows/image_build.yml
vendored
24
.github/workflows/image_build.yml
vendored
@@ -8,11 +8,6 @@ on:
|
||||
description: 'Base image'
|
||||
required: true
|
||||
type: string
|
||||
grpc-base-image:
|
||||
description: 'GRPC Base image, must be a compatible image with base-image'
|
||||
required: false
|
||||
default: ''
|
||||
type: string
|
||||
build-type:
|
||||
description: 'Build type'
|
||||
default: ''
|
||||
@@ -201,25 +196,19 @@ jobs:
|
||||
if: github.event_name != 'pull_request'
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
|
||||
build-args: |
|
||||
BUILD_TYPE=${{ inputs.build-type }}
|
||||
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
|
||||
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
MAKEFLAGS=${{ inputs.makeflags }}
|
||||
SKIP_DRIVERS=${{ inputs.skip-drivers }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
|
||||
cache-to: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }},mode=max,ignore-error=true
|
||||
platforms: ${{ inputs.platforms }}
|
||||
push: ${{ github.event_name != 'pull_request' }}
|
||||
tags: ${{ steps.meta.outputs.tags }}
|
||||
@@ -230,25 +219,18 @@ jobs:
|
||||
if: github.event_name == 'pull_request'
|
||||
with:
|
||||
builder: ${{ steps.buildx.outputs.name }}
|
||||
# The build-args MUST be an EXACT match between the image cache and other workflow steps that want to use that cache.
|
||||
# This means that even the MAKEFLAGS have to be an EXACT match.
|
||||
# If the build-args are not an EXACT match, it will result in a cache miss, which will require GRPC to be built from scratch.
|
||||
# This is why some build args like GRPC_VERSION and MAKEFLAGS are hardcoded
|
||||
build-args: |
|
||||
BUILD_TYPE=${{ inputs.build-type }}
|
||||
CUDA_MAJOR_VERSION=${{ inputs.cuda-major-version }}
|
||||
CUDA_MINOR_VERSION=${{ inputs.cuda-minor-version }}
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
GRPC_BASE_IMAGE=${{ inputs.grpc-base-image || inputs.base-image }}
|
||||
GRPC_MAKEFLAGS=--jobs=4 --output-sync=target
|
||||
GRPC_VERSION=v1.65.0
|
||||
MAKEFLAGS=${{ inputs.makeflags }}
|
||||
SKIP_DRIVERS=${{ inputs.skip-drivers }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
UBUNTU_CODENAME=${{ inputs.ubuntu-codename }}
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
cache-from: type=gha
|
||||
cache-from: type=registry,ref=quay.io/go-skynet/ci-cache:cache-localai${{ inputs.tag-suffix }}
|
||||
platforms: ${{ inputs.platforms }}
|
||||
#push: true
|
||||
tags: ${{ steps.meta_pull_request.outputs.tags }}
|
||||
|
||||
25
.github/workflows/test-extra.yml
vendored
25
.github/workflows/test-extra.yml
vendored
@@ -32,7 +32,6 @@ jobs:
|
||||
llama-cpp: ${{ steps.detect.outputs.llama-cpp }}
|
||||
ik-llama-cpp: ${{ steps.detect.outputs.ik-llama-cpp }}
|
||||
turboquant: ${{ steps.detect.outputs.turboquant }}
|
||||
buun-llama-cpp: ${{ steps.detect.outputs['buun-llama-cpp'] }}
|
||||
vllm: ${{ steps.detect.outputs.vllm }}
|
||||
sglang: ${{ steps.detect.outputs.sglang }}
|
||||
acestep-cpp: ${{ steps.detect.outputs.acestep-cpp }}
|
||||
@@ -614,30 +613,6 @@ jobs:
|
||||
- name: Build turboquant backend image and run gRPC e2e tests
|
||||
run: |
|
||||
make test-extra-backend-turboquant
|
||||
tests-buun-llama-cpp-grpc:
|
||||
needs: detect-changes
|
||||
if: needs.detect-changes.outputs['buun-llama-cpp'] == 'true' || needs.detect-changes.outputs.run-all == 'true'
|
||||
runs-on: ubuntu-latest
|
||||
timeout-minutes: 90
|
||||
steps:
|
||||
- name: Clone
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Setup Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.4'
|
||||
# Exercises the buun-llama-cpp (fork-of-a-fork) backend with the
|
||||
# fork-specific TurboQuant/TCQ KV-cache types. BACKEND_TEST_CACHE_TYPE_V
|
||||
# is set to turbo3 so the test round-trips through the fork's KV
|
||||
# allow-list — picking a stock llama.cpp type would only re-test the
|
||||
# shared code path. DFlash speculative decoding is not exercised here
|
||||
# because the one known public target/drafter pair (Qwen3.5-27B) is too
|
||||
# large for CI.
|
||||
- name: Build buun-llama-cpp backend image and run gRPC e2e tests
|
||||
run: |
|
||||
make test-extra-backend-buun-llama-cpp
|
||||
# tests-vllm-grpc is currently disabled in CI.
|
||||
#
|
||||
# The prebuilt vllm CPU wheel is compiled with AVX-512 VNNI/BF16
|
||||
|
||||
3
.github/workflows/test.yml
vendored
3
.github/workflows/test.yml
vendored
@@ -9,9 +9,6 @@ on:
|
||||
tags:
|
||||
- '*'
|
||||
|
||||
env:
|
||||
GRPC_VERSION: v1.65.0
|
||||
|
||||
concurrency:
|
||||
group: ci-tests-${{ github.head_ref || github.ref }}-${{ github.repository }}
|
||||
cancel-in-progress: true
|
||||
|
||||
@@ -19,6 +19,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
|
||||
|------|-------------|
|
||||
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
|
||||
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
||||
| [.agents/ci-caching.md](.agents/ci-caching.md) | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), `DEPS_REFRESH` weekly cache-buster for unpinned Python deps, manual eviction |
|
||||
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the `/import-model` dropdown is server-driven from `GET /backends/known`) |
|
||||
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
||||
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
|
||||
|
||||
@@ -1,5 +1,4 @@
|
||||
ARG BASE_IMAGE=ubuntu:24.04
|
||||
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
|
||||
ARG INTEL_BASE_IMAGE=${BASE_IMAGE}
|
||||
ARG UBUNTU_CODENAME=noble
|
||||
|
||||
@@ -149,6 +148,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
25
Makefile
25
Makefile
@@ -1,5 +1,5 @@
|
||||
# Disable parallel execution for backend builds
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/buun-llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
|
||||
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/turboquant backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/insightface backends/speaker-recognition backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/sglang backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/acestep-cpp backends/fish-speech backends/voxtral backends/opus backends/trl backends/llama-cpp-quantization backends/kokoros backends/sam3-cpp backends/qwen3-tts-cpp backends/tinygrad backends/sherpa-onnx
|
||||
|
||||
GOCMD=go
|
||||
GOTEST=$(GOCMD) test
|
||||
@@ -545,19 +545,6 @@ test-extra-backend-turboquant: docker-build-turboquant
|
||||
BACKEND_TEST_CACHE_TYPE_V=turbo3 \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## buun-llama-cpp: exercises the fork-of-a-fork backend (spiritbuun/buun-llama-cpp)
|
||||
## with the *TurboQuant/TCQ-specific* KV-cache types (turbo3 for V). Same rationale
|
||||
## as turboquant above: picking a standard llama.cpp type would only re-test the
|
||||
## shared code path. buun inherits turboquant's turbo2/turbo3/turbo4 and adds
|
||||
## turbo2_tcq / turbo3_tcq on top. DFlash speculative decoding is not exercised
|
||||
## here because no small DFlash drafter model exists (the known public pair is
|
||||
## Qwen3.5-27B, ~54 GB).
|
||||
test-extra-backend-buun-llama-cpp: docker-build-buun-llama-cpp
|
||||
BACKEND_IMAGE=local-ai-backend:buun-llama-cpp \
|
||||
BACKEND_TEST_CACHE_TYPE_K=q8_0 \
|
||||
BACKEND_TEST_CACHE_TYPE_V=turbo3 \
|
||||
$(MAKE) test-extra-backend
|
||||
|
||||
## Audio transcription wrapper for the llama-cpp backend.
|
||||
## Drives the new AudioTranscription / AudioTranscriptionStream RPCs against
|
||||
## ggml-org/Qwen3-ASR-0.6B-GGUF (a small ASR model that requires its mmproj
|
||||
@@ -896,7 +883,7 @@ docker-cuda12:
|
||||
|
||||
docker-image-intel:
|
||||
docker build \
|
||||
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04 \
|
||||
--build-arg BASE_IMAGE=intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04 \
|
||||
--build-arg IMAGE_TYPE=$(IMAGE_TYPE) \
|
||||
--build-arg GO_TAGS="$(GO_TAGS)" \
|
||||
--build-arg MAKEFLAGS="$(DOCKER_MAKEFLAGS)" \
|
||||
@@ -962,11 +949,6 @@ BACKEND_IK_LLAMA_CPP = ik-llama-cpp|ik-llama-cpp|.|false|false
|
||||
# turboquant is a llama.cpp fork with TurboQuant KV-cache quantization.
|
||||
# Reuses backend/cpp/llama-cpp grpc-server sources via a thin wrapper Makefile.
|
||||
BACKEND_TURBOQUANT = turboquant|turboquant|.|false|false
|
||||
# buun-llama-cpp is a fork-of-a-fork (spiritbuun/buun-llama-cpp forks
|
||||
# TheTom/llama-cpp-turboquant) that adds DFlash block-diffusion speculative
|
||||
# decoding and extra TCQ KV-cache variants on top of TurboQuant. Same thin
|
||||
# wrapper pattern as turboquant — reuses backend/cpp/llama-cpp grpc-server.
|
||||
BACKEND_BUUN_LLAMA_CPP = buun-llama-cpp|buun-llama-cpp|.|false|false
|
||||
|
||||
# Golang backends
|
||||
BACKEND_PIPER = piper|golang|.|false|true
|
||||
@@ -1047,7 +1029,6 @@ endef
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_LLAMA_CPP)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_IK_LLAMA_CPP)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_TURBOQUANT)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_BUUN_LLAMA_CPP)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_PIPER)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_LOCAL_STORE)))
|
||||
$(eval $(call generate-docker-build-target,$(BACKEND_HUGGINGFACE)))
|
||||
@@ -1099,7 +1080,7 @@ $(eval $(call generate-docker-build-target,$(BACKEND_SHERPA_ONNX)))
|
||||
docker-save-%: backend-images
|
||||
docker save local-ai-backend:$* -o backend-images/$*.tar
|
||||
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-buun-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
|
||||
docker-build-backends: docker-build-llama-cpp docker-build-ik-llama-cpp docker-build-turboquant docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-sglang docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-fish-speech docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-acestep-cpp docker-build-voxtral docker-build-mlx-distributed docker-build-trl docker-build-llama-cpp-quantization docker-build-tinygrad docker-build-kokoros docker-build-sam3-cpp docker-build-qwen3-tts-cpp docker-build-insightface docker-build-speaker-recognition docker-build-sherpa-onnx
|
||||
|
||||
########################################################
|
||||
### Mock Backend for E2E Tests
|
||||
|
||||
@@ -1,290 +0,0 @@
|
||||
ARG BASE_IMAGE=ubuntu:24.04
|
||||
ARG GRPC_BASE_IMAGE=${BASE_IMAGE}
|
||||
|
||||
|
||||
# The grpc target does one thing, it builds and installs GRPC. This is in it's own layer so that it can be effectively cached by CI.
|
||||
# You probably don't need to change anything here, and if you do, make sure that CI is adjusted so that the cache continues to work.
|
||||
FROM ${GRPC_BASE_IMAGE} AS grpc
|
||||
|
||||
# This is a bit of a hack, but it's required in order to be able to effectively cache this layer in CI
|
||||
ARG GRPC_MAKEFLAGS="-j4 -Otarget"
|
||||
ARG GRPC_VERSION=v1.65.0
|
||||
ARG CMAKE_FROM_SOURCE=false
|
||||
# CUDA Toolkit 13.x compatibility: CMake 3.31.9+ fixes toolchain detection/arch table issues
|
||||
ARG CMAKE_VERSION=3.31.10
|
||||
|
||||
ENV MAKEFLAGS=${GRPC_MAKEFLAGS}
|
||||
|
||||
WORKDIR /build
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
build-essential curl libssl-dev \
|
||||
git wget && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install CMake (the version in 22.04 is too old)
|
||||
RUN <<EOT bash
|
||||
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
|
||||
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
|
||||
else
|
||||
apt-get update && \
|
||||
apt-get install -y \
|
||||
cmake && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
fi
|
||||
EOT
|
||||
|
||||
# We install GRPC to a different prefix here so that we can copy in only the build artifacts later
|
||||
# saves several hundred MB on the final docker image size vs copying in the entire GRPC source tree
|
||||
# and running make install in the target container
|
||||
RUN git clone --recurse-submodules --jobs 4 -b ${GRPC_VERSION} --depth 1 --shallow-submodules https://github.com/grpc/grpc && \
|
||||
mkdir -p /build/grpc/cmake/build && \
|
||||
cd /build/grpc/cmake/build && \
|
||||
sed -i "216i\ TESTONLY" "../../third_party/abseil-cpp/absl/container/CMakeLists.txt" && \
|
||||
cmake -DgRPC_INSTALL=ON -DgRPC_BUILD_TESTS=OFF -DCMAKE_INSTALL_PREFIX:PATH=/opt/grpc ../.. && \
|
||||
make && \
|
||||
make install && \
|
||||
rm -rf /build
|
||||
|
||||
FROM ${BASE_IMAGE} AS builder
|
||||
ARG CMAKE_FROM_SOURCE=false
|
||||
ARG CMAKE_VERSION=3.31.10
|
||||
# We can target specific CUDA ARCHITECTURES like --build-arg CUDA_DOCKER_ARCH='75;86;89;120'
|
||||
ARG CUDA_DOCKER_ARCH
|
||||
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
|
||||
ARG CMAKE_ARGS
|
||||
ENV CMAKE_ARGS=${CMAKE_ARGS}
|
||||
ARG BACKEND=rerankers
|
||||
ARG BUILD_TYPE
|
||||
ENV BUILD_TYPE=${BUILD_TYPE}
|
||||
ARG CUDA_MAJOR_VERSION
|
||||
ARG CUDA_MINOR_VERSION
|
||||
ARG SKIP_DRIVERS=false
|
||||
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION}
|
||||
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION}
|
||||
ENV DEBIAN_FRONTEND=noninteractive
|
||||
ARG TARGETARCH
|
||||
ARG TARGETVARIANT
|
||||
ARG GO_VERSION=1.25.4
|
||||
ARG UBUNTU_VERSION=2404
|
||||
|
||||
RUN apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
build-essential \
|
||||
ccache git \
|
||||
ca-certificates \
|
||||
make \
|
||||
pkg-config libcurl4-openssl-dev \
|
||||
curl unzip \
|
||||
libssl-dev wget && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Cuda
|
||||
ENV PATH=/usr/local/cuda/bin:${PATH}
|
||||
|
||||
# HipBLAS requirements
|
||||
ENV PATH=/opt/rocm/bin:${PATH}
|
||||
|
||||
|
||||
# Vulkan requirements
|
||||
RUN <<EOT bash
|
||||
if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
software-properties-common pciutils wget gpg-agent && \
|
||||
apt-get install -y libglm-dev cmake libxcb-dri3-0 libxcb-present0 libpciaccess0 \
|
||||
libpng-dev libxcb-keysyms1-dev libxcb-dri3-dev libx11-dev g++ gcc \
|
||||
libwayland-dev libxrandr-dev libxcb-randr0-dev libxcb-ewmh-dev \
|
||||
git python-is-python3 bison libx11-xcb-dev liblz4-dev libzstd-dev \
|
||||
ocaml-core ninja-build pkg-config libxml2-dev wayland-protocols python3-jsonschema \
|
||||
clang-format qtbase5-dev qt6-base-dev libxcb-glx0-dev sudo xz-utils
|
||||
if [ "amd64" = "$TARGETARCH" ]; then
|
||||
wget "https://sdk.lunarg.com/sdk/download/1.4.335.0/linux/vulkansdk-linux-x86_64-1.4.335.0.tar.xz" && \
|
||||
tar -xf vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
|
||||
rm vulkansdk-linux-x86_64-1.4.335.0.tar.xz && \
|
||||
mkdir -p /opt/vulkan-sdk && \
|
||||
mv 1.4.335.0 /opt/vulkan-sdk/ && \
|
||||
cd /opt/vulkan-sdk/1.4.335.0 && \
|
||||
./vulkansdk --no-deps --maxjobs \
|
||||
vulkan-loader \
|
||||
vulkan-validationlayers \
|
||||
vulkan-extensionlayer \
|
||||
vulkan-tools \
|
||||
shaderc && \
|
||||
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/bin/* /usr/bin/ && \
|
||||
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/lib/* /usr/lib/x86_64-linux-gnu/ && \
|
||||
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/include/* /usr/include/ && \
|
||||
cp -rfv /opt/vulkan-sdk/1.4.335.0/x86_64/share/* /usr/share/ && \
|
||||
rm -rf /opt/vulkan-sdk
|
||||
fi
|
||||
if [ "arm64" = "$TARGETARCH" ]; then
|
||||
mkdir vulkan && cd vulkan && \
|
||||
curl -L -o vulkan-sdk.tar.xz https://github.com/mudler/vulkan-sdk-arm/releases/download/1.4.335.0/vulkansdk-ubuntu-24.04-arm-1.4.335.0.tar.xz && \
|
||||
tar -xvf vulkan-sdk.tar.xz && \
|
||||
rm vulkan-sdk.tar.xz && \
|
||||
cd 1.4.335.0 && \
|
||||
cp -rfv aarch64/bin/* /usr/bin/ && \
|
||||
cp -rfv aarch64/lib/* /usr/lib/aarch64-linux-gnu/ && \
|
||||
cp -rfv aarch64/include/* /usr/include/ && \
|
||||
cp -rfv aarch64/share/* /usr/share/ && \
|
||||
cd ../.. && \
|
||||
rm -rf vulkan
|
||||
fi
|
||||
ldconfig && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
fi
|
||||
EOT
|
||||
|
||||
# CuBLAS requirements
|
||||
RUN <<EOT bash
|
||||
if ( [ "${BUILD_TYPE}" = "cublas" ] || [ "${BUILD_TYPE}" = "l4t" ] ) && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
software-properties-common pciutils
|
||||
if [ "amd64" = "$TARGETARCH" ]; then
|
||||
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/x86_64/cuda-keyring_1.1-1_all.deb
|
||||
fi
|
||||
if [ "arm64" = "$TARGETARCH" ]; then
|
||||
if [ "${CUDA_MAJOR_VERSION}" = "13" ]; then
|
||||
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/sbsa/cuda-keyring_1.1-1_all.deb
|
||||
else
|
||||
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${UBUNTU_VERSION}/arm64/cuda-keyring_1.1-1_all.deb
|
||||
fi
|
||||
fi
|
||||
dpkg -i cuda-keyring_1.1-1_all.deb && \
|
||||
rm -f cuda-keyring_1.1-1_all.deb && \
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
||||
libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
||||
libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
||||
libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
||||
libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
|
||||
libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
||||
if [ "${CUDA_MAJOR_VERSION}" = "13" ] && [ "arm64" = "$TARGETARCH" ]; then
|
||||
apt-get install -y --no-install-recommends \
|
||||
libcufile-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libcudnn9-cuda-${CUDA_MAJOR_VERSION} cuda-cupti-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} libnvjitlink-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION}
|
||||
fi
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
fi
|
||||
EOT
|
||||
|
||||
|
||||
# https://github.com/NVIDIA/Isaac-GR00T/issues/343
|
||||
RUN <<EOT bash
|
||||
if [ "${BUILD_TYPE}" = "cublas" ] && [ "${TARGETARCH}" = "arm64" ]; then
|
||||
wget https://developer.download.nvidia.com/compute/cudss/0.6.0/local_installers/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
|
||||
dpkg -i cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0_0.6.0-1_arm64.deb && \
|
||||
cp /var/cudss-local-tegra-repo-ubuntu${UBUNTU_VERSION}-0.6.0/cudss-*-keyring.gpg /usr/share/keyrings/ && \
|
||||
apt-get update && apt-get -y install cudss cudss-cuda-${CUDA_MAJOR_VERSION} && \
|
||||
wget https://developer.download.nvidia.com/compute/nvpl/25.5/local_installers/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
|
||||
dpkg -i nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5_1.0-1_arm64.deb && \
|
||||
cp /var/nvpl-local-repo-ubuntu${UBUNTU_VERSION}-25.5/nvpl-*-keyring.gpg /usr/share/keyrings/ && \
|
||||
apt-get update && apt-get install -y nvpl
|
||||
fi
|
||||
EOT
|
||||
|
||||
# If we are building with clblas support, we need the libraries for the builds
|
||||
RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
libclblast-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* \
|
||||
; fi
|
||||
|
||||
RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
# I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
|
||||
# to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
|
||||
ldconfig && \
|
||||
# Log which GPU architectures have rocBLAS kernel support
|
||||
echo "rocBLAS library data architectures:" && \
|
||||
(ls /opt/rocm*/lib/rocblas/library/Kernels* 2>/dev/null || ls /opt/rocm*/lib64/rocblas/library/Kernels* 2>/dev/null) | grep -oP 'gfx[0-9a-z+-]+' | sort -u || \
|
||||
echo "WARNING: No rocBLAS kernel data found" \
|
||||
; fi
|
||||
|
||||
RUN echo "TARGETARCH: $TARGETARCH"
|
||||
|
||||
# We need protoc installed, and the version in 22.04 is too old. We will create one as part installing the GRPC build below
|
||||
# but that will also being in a newer version of absl which stablediffusion cannot compile with. This version of protoc is only
|
||||
# here so that we can generate the grpc code for the stablediffusion build
|
||||
RUN <<EOT bash
|
||||
if [ "amd64" = "$TARGETARCH" ]; then
|
||||
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-x86_64.zip -o protoc.zip && \
|
||||
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
|
||||
rm protoc.zip
|
||||
fi
|
||||
if [ "arm64" = "$TARGETARCH" ]; then
|
||||
curl -L -s https://github.com/protocolbuffers/protobuf/releases/download/v27.1/protoc-27.1-linux-aarch_64.zip -o protoc.zip && \
|
||||
unzip -j -d /usr/local/bin protoc.zip bin/protoc && \
|
||||
rm protoc.zip
|
||||
fi
|
||||
EOT
|
||||
|
||||
# Install CMake (the version in 22.04 is too old)
|
||||
RUN <<EOT bash
|
||||
if [ "${CMAKE_FROM_SOURCE}" = "true" ]; then
|
||||
curl -L -s https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz -o cmake.tar.gz && tar xvf cmake.tar.gz && cd cmake-${CMAKE_VERSION} && ./configure && make && make install
|
||||
else
|
||||
apt-get update && \
|
||||
apt-get install -y \
|
||||
cmake && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/*
|
||||
fi
|
||||
EOT
|
||||
|
||||
COPY --from=grpc /opt/grpc /usr/local
|
||||
|
||||
|
||||
COPY . /LocalAI
|
||||
|
||||
RUN <<'EOT' bash
|
||||
set -euxo pipefail
|
||||
|
||||
if [[ -n "${CUDA_DOCKER_ARCH:-}" ]]; then
|
||||
CUDA_ARCH_ESC="${CUDA_DOCKER_ARCH//;/\\;}"
|
||||
export CMAKE_ARGS="${CMAKE_ARGS:-} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH_ESC}"
|
||||
echo "CMAKE_ARGS(env) = ${CMAKE_ARGS}"
|
||||
rm -rf /LocalAI/backend/cpp/buun-llama-cpp-*-build
|
||||
fi
|
||||
|
||||
cd /LocalAI/backend/cpp/buun-llama-cpp
|
||||
|
||||
if [ "${TARGETARCH}" = "arm64" ] || [ "${BUILD_TYPE}" = "hipblas" ]; then
|
||||
make buun-llama-cpp-fallback
|
||||
make buun-llama-cpp-grpc
|
||||
make buun-llama-cpp-rpc-server
|
||||
else
|
||||
make buun-llama-cpp-avx
|
||||
make buun-llama-cpp-avx2
|
||||
make buun-llama-cpp-avx512
|
||||
make buun-llama-cpp-fallback
|
||||
make buun-llama-cpp-grpc
|
||||
make buun-llama-cpp-rpc-server
|
||||
fi
|
||||
EOT
|
||||
|
||||
|
||||
# Copy libraries using a script to handle architecture differences
|
||||
RUN make -BC /LocalAI/backend/cpp/buun-llama-cpp package
|
||||
|
||||
|
||||
FROM scratch
|
||||
|
||||
|
||||
# Copy all available binaries (the build process only creates the appropriate ones for the target architecture)
|
||||
COPY --from=builder /LocalAI/backend/cpp/buun-llama-cpp/package/. ./
|
||||
@@ -147,6 +147,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -206,6 +206,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -162,6 +162,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
@@ -202,6 +203,13 @@ COPY scripts/build/package-gpu-libs.sh /package-gpu-libs.sh
|
||||
ARG FROM_SOURCE=""
|
||||
ENV FROM_SOURCE=${FROM_SOURCE}
|
||||
|
||||
# Cache-buster for the per-backend `make` step. Most Python backends list
|
||||
# unpinned deps (torch, transformers, vllm, ...), so a warm registry cache
|
||||
# would otherwise freeze upstream versions indefinitely. CI passes a value
|
||||
# that rolls weekly so the install layer is rebuilt at most once per week
|
||||
# and picks up newer wheels from PyPI / nightly indexes.
|
||||
ARG DEPS_REFRESH=initial
|
||||
|
||||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||||
|
||||
# Package GPU libraries into the backend's lib directory
|
||||
@@ -216,4 +224,4 @@ RUN if [ -f "/${BACKEND}/package.sh" ]; then \
|
||||
|
||||
FROM scratch
|
||||
ARG BACKEND=rerankers
|
||||
COPY --from=builder /${BACKEND}/ /
|
||||
COPY --from=builder /${BACKEND}/ /
|
||||
|
||||
@@ -204,6 +204,7 @@ RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
|
||||
apt-get update && \
|
||||
apt-get install -y --no-install-recommends \
|
||||
hipblas-dev \
|
||||
hipblaslt-dev \
|
||||
rocblas-dev && \
|
||||
apt-get clean && \
|
||||
rm -rf /var/lib/apt/lists/* && \
|
||||
|
||||
@@ -1,85 +0,0 @@
|
||||
|
||||
# Pinned to the HEAD of master on https://github.com/spiritbuun/buun-llama-cpp.
|
||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||
BUUN_LLAMA_VERSION?=22464d0848b87c5d56b52fdf6af2e5da46bf803e
|
||||
LLAMA_REPO?=https://github.com/spiritbuun/buun-llama-cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
BUILD_TYPE?=
|
||||
NATIVE?=false
|
||||
ONEAPI_VARS?=/opt/intel/oneapi/setvars.sh
|
||||
TARGET?=--target grpc-server
|
||||
JOBS?=$(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 1)
|
||||
ARCH?=$(shell uname -m)
|
||||
|
||||
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
|
||||
LLAMA_CPP_DIR := $(CURRENT_MAKEFILE_DIR)/../llama-cpp
|
||||
|
||||
GREEN := \033[0;32m
|
||||
RESET := \033[0m
|
||||
|
||||
# buun-llama-cpp is a llama.cpp fork-of-a-fork (spiritbuun/buun-llama-cpp forked
|
||||
# TheTom/llama-cpp-turboquant, which itself forked ggml-org/llama.cpp). Rather
|
||||
# than duplicating grpc-server.cpp / CMakeLists.txt / prepare.sh we reuse the
|
||||
# ones in backend/cpp/llama-cpp, and only swap which repo+sha the fetch step
|
||||
# pulls. Each flavor target copies ../llama-cpp into a sibling
|
||||
# ../buun-llama-cpp-<flavor>-build directory, then invokes llama-cpp's own
|
||||
# build-llama-cpp-grpc-server with LLAMA_REPO/LLAMA_VERSION overridden to point
|
||||
# at the fork.
|
||||
PATCHES_DIR := $(CURRENT_MAKEFILE_DIR)/patches
|
||||
|
||||
# Each flavor target:
|
||||
# 1. copies backend/cpp/llama-cpp/ (grpc-server.cpp + prepare.sh + CMakeLists.txt + Makefile)
|
||||
# into a sibling buun-llama-cpp-<flavor>-build directory;
|
||||
# 2. clones the buun fork into buun-llama-cpp-<flavor>-build/llama.cpp via the
|
||||
# copy's own `llama.cpp` target, overriding LLAMA_REPO/LLAMA_VERSION;
|
||||
# 3. applies patches from backend/cpp/buun-llama-cpp/patches/ to the cloned
|
||||
# fork sources (for backporting upstream commits the fork hasn't pulled);
|
||||
# 4. runs the copy's `grpc-server` target, which produces the binary we copy
|
||||
# up as buun-llama-cpp-<flavor>.
|
||||
define buun-llama-cpp-build
|
||||
rm -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build
|
||||
cp -rf $(LLAMA_CPP_DIR) $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build
|
||||
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build purge
|
||||
# Augment the copied grpc-server.cpp's KV-cache allow-list with the
|
||||
# fork's turbo2/turbo3/turbo4/turbo2_tcq/turbo3_tcq types and wire up the
|
||||
# DFlash-specific option handlers (tree_budget / draft_topk). We patch the
|
||||
# *copy*, never the original under backend/cpp/llama-cpp/, so the stock
|
||||
# llama-cpp build stays compiling against vanilla upstream.
|
||||
bash $(CURRENT_MAKEFILE_DIR)/patch-grpc-server.sh $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/grpc-server.cpp
|
||||
$(info $(GREEN)I buun-llama-cpp build info:$(1)$(RESET))
|
||||
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(BUUN_LLAMA_VERSION) \
|
||||
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build llama.cpp
|
||||
bash $(CURRENT_MAKEFILE_DIR)/apply-patches.sh $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/llama.cpp $(PATCHES_DIR)
|
||||
CMAKE_ARGS="$(CMAKE_ARGS) $(2)" TARGET="$(3)" \
|
||||
LLAMA_REPO=$(LLAMA_REPO) LLAMA_VERSION=$(BUUN_LLAMA_VERSION) \
|
||||
$(MAKE) -C $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build grpc-server
|
||||
cp -rfv $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-$(1)-build/grpc-server buun-llama-cpp-$(1)
|
||||
endef
|
||||
|
||||
buun-llama-cpp-avx2:
|
||||
$(call buun-llama-cpp-build,avx2,-DGGML_AVX=on -DGGML_AVX2=on -DGGML_AVX512=off -DGGML_FMA=on -DGGML_F16C=on,--target grpc-server)
|
||||
|
||||
buun-llama-cpp-avx512:
|
||||
$(call buun-llama-cpp-build,avx512,-DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=on -DGGML_FMA=on -DGGML_F16C=on,--target grpc-server)
|
||||
|
||||
buun-llama-cpp-avx:
|
||||
$(call buun-llama-cpp-build,avx,-DGGML_AVX=on -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)
|
||||
|
||||
buun-llama-cpp-fallback:
|
||||
$(call buun-llama-cpp-build,fallback,-DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server)
|
||||
|
||||
buun-llama-cpp-grpc:
|
||||
$(call buun-llama-cpp-build,grpc,-DGGML_RPC=ON -DGGML_AVX=off -DGGML_AVX2=off -DGGML_AVX512=off -DGGML_FMA=off -DGGML_F16C=off -DGGML_BMI2=off,--target grpc-server --target rpc-server)
|
||||
|
||||
buun-llama-cpp-rpc-server: buun-llama-cpp-grpc
|
||||
cp -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-grpc-build/llama.cpp/build/bin/rpc-server buun-llama-cpp-rpc-server
|
||||
|
||||
package:
|
||||
bash package.sh
|
||||
|
||||
purge:
|
||||
rm -rf $(CURRENT_MAKEFILE_DIR)/../buun-llama-cpp-*-build
|
||||
rm -rf buun-llama-cpp-* package
|
||||
|
||||
clean: purge
|
||||
@@ -1,50 +0,0 @@
|
||||
#!/bin/bash
|
||||
# Apply the buun-llama-cpp patch series to a cloned buun-llama-cpp checkout.
|
||||
#
|
||||
# buun-llama-cpp is a fork-of-a-fork that branched off upstream llama.cpp
|
||||
# before some API changes the shared backend/cpp/llama-cpp/grpc-server.cpp
|
||||
# depends on. We carry those upstream commits as patch files under
|
||||
# backend/cpp/buun-llama-cpp/patches/ and apply them here so the reused
|
||||
# grpc-server source compiles against the fork unmodified.
|
||||
#
|
||||
# Drop the corresponding patch from patches/ whenever the fork catches up with
|
||||
# upstream — the build will fail fast if a patch stops applying, which is the
|
||||
# signal to retire it.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
if [[ $# -ne 2 ]]; then
|
||||
echo "usage: $0 <llama.cpp-src-dir> <patches-dir>" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
SRC_DIR=$1
|
||||
PATCHES_DIR=$2
|
||||
|
||||
if [[ ! -d "$SRC_DIR" ]]; then
|
||||
echo "source dir does not exist: $SRC_DIR" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if [[ ! -d "$PATCHES_DIR" ]]; then
|
||||
echo "no patches dir at $PATCHES_DIR, nothing to apply"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
shopt -s nullglob
|
||||
patches=("$PATCHES_DIR"/*.patch)
|
||||
shopt -u nullglob
|
||||
|
||||
if [[ ${#patches[@]} -eq 0 ]]; then
|
||||
echo "no .patch files in $PATCHES_DIR, nothing to apply"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd "$SRC_DIR"
|
||||
|
||||
for patch in "${patches[@]}"; do
|
||||
echo "==> applying $patch"
|
||||
git apply --verbose "$patch"
|
||||
done
|
||||
|
||||
echo "all buun-llama-cpp patches applied successfully"
|
||||
@@ -1,57 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Script to copy the appropriate libraries based on architecture
|
||||
# This script is used in the final stage of the Dockerfile
|
||||
|
||||
set -e
|
||||
|
||||
CURDIR=$(dirname "$(realpath $0)")
|
||||
REPO_ROOT="${CURDIR}/../../.."
|
||||
|
||||
# Create lib directory
|
||||
mkdir -p $CURDIR/package/lib
|
||||
|
||||
cp -avrf $CURDIR/buun-llama-cpp-* $CURDIR/package/
|
||||
cp -rfv $CURDIR/run.sh $CURDIR/package/
|
||||
|
||||
# Detect architecture and copy appropriate libraries
|
||||
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
|
||||
# x86_64 architecture
|
||||
echo "Detected x86_64 architecture, copying x86_64 libraries..."
|
||||
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
|
||||
cp -arfLv /lib/x86_64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
|
||||
cp -arfLv /lib/x86_64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
|
||||
elif [ -f "/lib/ld-linux-aarch64.so.1" ]; then
|
||||
# ARM64 architecture
|
||||
echo "Detected ARM64 architecture, copying ARM64 libraries..."
|
||||
cp -arfLv /lib/ld-linux-aarch64.so.1 $CURDIR/package/lib/ld.so
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libc.so.6 $CURDIR/package/lib/libc.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgcc_s.so.1 $CURDIR/package/lib/libgcc_s.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libstdc++.so.6 $CURDIR/package/lib/libstdc++.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libm.so.6 $CURDIR/package/lib/libm.so.6
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libgomp.so.1 $CURDIR/package/lib/libgomp.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libdl.so.2 $CURDIR/package/lib/libdl.so.2
|
||||
cp -arfLv /lib/aarch64-linux-gnu/librt.so.1 $CURDIR/package/lib/librt.so.1
|
||||
cp -arfLv /lib/aarch64-linux-gnu/libpthread.so.0 $CURDIR/package/lib/libpthread.so.0
|
||||
else
|
||||
echo "Error: Could not detect architecture"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Package GPU libraries based on BUILD_TYPE
|
||||
GPU_LIB_SCRIPT="${REPO_ROOT}/scripts/build/package-gpu-libs.sh"
|
||||
if [ -f "$GPU_LIB_SCRIPT" ]; then
|
||||
echo "Packaging GPU libraries for BUILD_TYPE=${BUILD_TYPE:-cpu}..."
|
||||
source "$GPU_LIB_SCRIPT" "$CURDIR/package/lib"
|
||||
package_gpu_libs
|
||||
fi
|
||||
|
||||
echo "Packaging completed successfully"
|
||||
ls -liah $CURDIR/package/
|
||||
ls -liah $CURDIR/package/lib/
|
||||
@@ -1,162 +0,0 @@
|
||||
#!/bin/bash
|
||||
# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
|
||||
# buun-llama-cpp build to account for three gaps between upstream and the fork:
|
||||
#
|
||||
# 1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
|
||||
# fork-specific `turbo2` / `turbo3` / `turbo4` cache types plus the buun
|
||||
# additions `turbo2_tcq` / `turbo3_tcq`.
|
||||
#
|
||||
# 2. Wire up buun-exclusive speculative-decoding option handlers
|
||||
# (tree_budget / draft_topk) alongside the existing spec_* handlers.
|
||||
# These reference struct fields (common_params.speculative.tree_budget
|
||||
# and .draft_topk) that only exist in buun's common/common.h — adding
|
||||
# them to the shared backend/cpp/llama-cpp/grpc-server.cpp would break
|
||||
# the stock llama-cpp build, so we inject them only into the buun copy.
|
||||
#
|
||||
# 3. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
|
||||
# server-side random per-instance marker) with the legacy "<__media__>"
|
||||
# literal. The fork branched before that PR, so server-common.cpp has no
|
||||
# get_media_marker symbol. The fork's mtmd_default_marker() still returns
|
||||
# "<__media__>", and Go-side tooling falls back to that sentinel when the
|
||||
# backend does not expose media_marker, so substituting the literal keeps
|
||||
# behavior identical on the buun path.
|
||||
#
|
||||
# We patch the *copy* sitting in buun-llama-cpp-<flavor>-build/, never the
|
||||
# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
|
||||
# compiling against vanilla upstream.
|
||||
#
|
||||
# Idempotent: skips each insertion if its marker is already present (so re-runs
|
||||
# of the same build dir don't double-insert).
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
if [[ $# -ne 1 ]]; then
|
||||
echo "usage: $0 <grpc-server.cpp>" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
SRC=$1
|
||||
|
||||
if [[ ! -f "$SRC" ]]; then
|
||||
echo "grpc-server.cpp not found at $SRC" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if grep -q 'GGML_TYPE_TURBO2_TCQ' "$SRC"; then
|
||||
echo "==> $SRC already has buun cache types, skipping KV allow-list patch"
|
||||
else
|
||||
echo "==> patching $SRC to allow turbo2/turbo3/turbo4/turbo2_tcq/turbo3_tcq KV-cache types"
|
||||
|
||||
# Insert the five TURBO entries right after the first ` GGML_TYPE_Q5_1,`
|
||||
# line (the kv_cache_types[] allow-list). Using awk because the builder
|
||||
# image does not ship python3, and GNU sed's multi-line `a\` quoting is
|
||||
# awkward.
|
||||
awk '
|
||||
/^ GGML_TYPE_Q5_1,$/ && !done {
|
||||
print
|
||||
print " // buun-llama-cpp fork extras — added by patch-grpc-server.sh"
|
||||
print " GGML_TYPE_TURBO2_0,"
|
||||
print " GGML_TYPE_TURBO3_0,"
|
||||
print " GGML_TYPE_TURBO4_0,"
|
||||
print " GGML_TYPE_TURBO2_TCQ,"
|
||||
print " GGML_TYPE_TURBO3_TCQ,"
|
||||
done = 1
|
||||
next
|
||||
}
|
||||
{ print }
|
||||
END {
|
||||
if (!done) {
|
||||
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
|
||||
echo "==> KV allow-list patch OK"
|
||||
fi
|
||||
|
||||
if grep -q 'optname, "tree_budget"' "$SRC"; then
|
||||
echo "==> $SRC already has DFlash option handlers, skipping"
|
||||
else
|
||||
echo "==> patching $SRC to add tree_budget / draft_topk option handlers"
|
||||
|
||||
# Insert two new `else if` handlers between the inner close-brace of the
|
||||
# `spec_p_split` block and the next `} else if (…spec_ngram_size_n…)` line.
|
||||
# Upstream writes each `} else if` as a single physical line, so we don't
|
||||
# emit an outer `}` ourselves — the existing next line provides both the
|
||||
# close of our `draft_topk` block and the open of `spec_ngram_size_n`.
|
||||
# Anchor on the exact 3-line body of spec_p_split so we can't drift.
|
||||
awk '
|
||||
prev2 == " } else if (!strcmp(optname, \"spec_p_split\")) {" &&
|
||||
prev1 ~ /^ +if \(optval != NULL\) \{$/ &&
|
||||
$0 ~ /^ +try \{ params\.speculative\.p_split = std::stof\(optval_str\); \} catch \(\.\.\.\) \{\}$/ &&
|
||||
!done {
|
||||
print # print the try-line itself
|
||||
getline inner_close # read " }" closing the inner if
|
||||
print inner_close # print it — this closes spec_p_split body
|
||||
print " // buun-llama-cpp DFlash options — added by patch-grpc-server.sh"
|
||||
print " } else if (!strcmp(optname, \"tree_budget\")) {"
|
||||
print " if (optval != NULL) {"
|
||||
print " try { params.speculative.tree_budget = std::stoi(optval_str); } catch (...) {}"
|
||||
print " }"
|
||||
print " } else if (!strcmp(optname, \"draft_topk\")) {"
|
||||
print " if (optval != NULL) {"
|
||||
print " try { params.speculative.draft_topk = std::stoi(optval_str); } catch (...) {}"
|
||||
print " }"
|
||||
# The next source line (`} else if (…spec_ngram_size_n…) {`) closes
|
||||
# our draft_topk block and continues the chain naturally; fall back
|
||||
# into the main loop to emit it and everything after.
|
||||
done = 1
|
||||
prev2 = prev1
|
||||
prev1 = inner_close
|
||||
next
|
||||
}
|
||||
{ print; prev2 = prev1; prev1 = $0 }
|
||||
END {
|
||||
if (!done) {
|
||||
print "patch-grpc-server.sh: spec_p_split anchor not found" > "/dev/stderr"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
|
||||
echo "==> DFlash option-handler patch OK"
|
||||
fi
|
||||
|
||||
if grep -qE 'ctx_server\.get_meta\(\)\.logit_bias_eog|params_base\.sampling\.logit_bias_eog,' "$SRC"; then
|
||||
echo "==> patching $SRC to drop the logit_bias_eog arg from params_from_json_cmpl() callsites (buun still uses the pre-refactor 4-arg signature)"
|
||||
# Upstream llama.cpp refactored params_from_json_cmpl to take a precomputed
|
||||
# logit_bias_eog vector after buun's 2026-04-05 fork-point — simultaneously
|
||||
# adding server_context_meta::logit_bias_eog as the supplier. Buun carries
|
||||
# neither change: its params_from_json_cmpl is still 4-arg, and internally
|
||||
# derives logit_bias_eog from the common_params it's passed. So we just
|
||||
# delete the argument line entirely — the remaining 4 args match buun's
|
||||
# signature and the resulting behavior matches upstream bit-for-bit
|
||||
# (upstream's 5th arg is the same data buun derives internally).
|
||||
#
|
||||
# Guard is broad so this works whether the line has been run through this
|
||||
# block before (leaving params_base.sampling.logit_bias_eog,) or not
|
||||
# (leaving the original ctx_server.get_meta().logit_bias_eog,).
|
||||
sed -E '/^[[:space:]]+(ctx_server\.get_meta\(\)\.logit_bias_eog|params_base\.sampling\.logit_bias_eog),$/d' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
echo "==> logit_bias_eog arg drop OK"
|
||||
else
|
||||
echo "==> $SRC has no logit_bias_eog arg line, skipping"
|
||||
fi
|
||||
|
||||
if grep -q 'get_media_marker()' "$SRC"; then
|
||||
echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
|
||||
# Only one call site today (ModelMetadata), but replace all occurrences to
|
||||
# stay robust if upstream adds more. Use a temp file to avoid relying on
|
||||
# sed -i portability (the builder image uses GNU sed, but keeping this
|
||||
# consistent with the awk block above).
|
||||
sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
echo "==> get_media_marker() substitution OK"
|
||||
else
|
||||
echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
|
||||
fi
|
||||
|
||||
echo "==> all patches applied"
|
||||
@@ -1,46 +0,0 @@
|
||||
Subject: [PATCH] ggml-cuda/fattn: provide atomicAdd(double*,double) shim for pre-sm_60
|
||||
|
||||
Buun's Q² calibration path in ggml_cuda_turbo_scale_q calls
|
||||
atomicAdd(&d_q_channel_sq_fattn[threadIdx.x], (double)(val * val));
|
||||
but native double atomicAdd is only available on compute capability 6.0
|
||||
and newer. Compiling against a CUDA arch list that includes older
|
||||
architectures (LocalAI's CUDA 12 Docker image builds for the full
|
||||
published arch range) fails with:
|
||||
|
||||
fattn.cu(812): error: no instance of overloaded function "atomicAdd"
|
||||
matches the argument list, argument types are: (double *, double)
|
||||
|
||||
Add the canonical CUDA-programming-guide shim at the top of fattn.cu so
|
||||
pre-sm_60 codegen has a definition to call. On sm_60+ the native CUDA
|
||||
intrinsic is used and the shim is elided via __CUDA_ARCH__.
|
||||
|
||||
--- a/ggml/src/ggml-cuda/fattn.cu
|
||||
+++ b/ggml/src/ggml-cuda/fattn.cu
|
||||
@@ -7,6 +7,27 @@
|
||||
|
||||
#include <atomic>
|
||||
|
||||
+// Pre-sm_60 double atomicAdd shim. Native double atomicAdd(double*,double)
|
||||
+// is only available on CUDA compute capability 6.0+ (see CUDA C Programming
|
||||
+// Guide, B.15 Atomic Functions). Buun's Q² calibration path below calls
|
||||
+// atomicAdd with a double*; without this definition, nvcc fails to find a
|
||||
+// matching overload whenever the compile target list includes pre-sm_60
|
||||
+// architectures. The standard CAS loop implementation below matches the
|
||||
+// semantics of the native intrinsic.
|
||||
+#if defined(__CUDA_ARCH__) && __CUDA_ARCH__ < 600
|
||||
+static __device__ double atomicAdd(double * address, double val) {
|
||||
+ unsigned long long int * address_as_ull = (unsigned long long int *)address;
|
||||
+ unsigned long long int old = *address_as_ull;
|
||||
+ unsigned long long int assumed;
|
||||
+ do {
|
||||
+ assumed = old;
|
||||
+ old = atomicCAS(address_as_ull, assumed,
|
||||
+ __double_as_longlong(val + __longlong_as_double(assumed)));
|
||||
+ } while (assumed != old);
|
||||
+ return __longlong_as_double(old);
|
||||
+}
|
||||
+#endif
|
||||
+
|
||||
// InnerQ: update the fattn-side inverse scale array from host (all devices)
|
||||
void turbo_innerq_update_fattn_scales(const float * scale_inv) {
|
||||
int cur_device;
|
||||
@@ -1,32 +0,0 @@
|
||||
Subject: [PATCH] ggml-cuda/argmax: pass WARP_SIZE to the top-K __shfl_xor_sync calls
|
||||
|
||||
Two __shfl_xor_sync calls in the top-K intra-warp merge drop the `width`
|
||||
argument and rely on the CUDA default (warpSize). Every other call in
|
||||
the same file already passes WARP_SIZE explicitly, and the HIP/ROCm
|
||||
compatibility shim at ggml/src/ggml-cuda/vendors/hip.h:33 is a 4-arg
|
||||
function-like macro — so the 3-arg form fails to preprocess when
|
||||
building with hipcc against ROCm:
|
||||
|
||||
argmax.cu:265: error: too few arguments provided to function-like
|
||||
macro invocation
|
||||
note: macro '__shfl_xor_sync' defined here:
|
||||
#define __shfl_xor_sync(mask, var, laneMask, width) \
|
||||
__shfl_xor(var, laneMask, width)
|
||||
|
||||
Align the two call sites with the rest of the file by passing WARP_SIZE
|
||||
explicitly. On CUDA the generated code is unchanged (warpSize is the
|
||||
default); on HIP it now matches the macro's arity.
|
||||
|
||||
--- a/ggml/src/ggml-cuda/argmax.cu
|
||||
+++ b/ggml/src/ggml-cuda/argmax.cu
|
||||
@@ -262,8 +262,8 @@
|
||||
// Each step: lane gets partner's min element, if it beats our min, replace and re-heapify
|
||||
for (int offset = WARP_SIZE / 2; offset > 0; offset >>= 1) {
|
||||
for (int i = 0; i < K; i++) {
|
||||
- float partner_val = __shfl_xor_sync(0xFFFFFFFF, heap_val[i], offset);
|
||||
- int partner_idx = __shfl_xor_sync(0xFFFFFFFF, heap_idx[i], offset);
|
||||
+ float partner_val = __shfl_xor_sync(0xFFFFFFFF, heap_val[i], offset, WARP_SIZE);
|
||||
+ int partner_idx = __shfl_xor_sync(0xFFFFFFFF, heap_idx[i], offset, WARP_SIZE);
|
||||
if (partner_val > heap_val[0]) {
|
||||
heap_val[0] = partner_val;
|
||||
heap_idx[0] = partner_idx;
|
||||
@@ -1,24 +0,0 @@
|
||||
Subject: [PATCH] ggml-cuda/vendors/hip: alias cudaMemcpy{To,From}Symbol to hip counterparts
|
||||
|
||||
Buun's Q² calibration + TCQ codebook upload paths in fattn.cu use
|
||||
cudaMemcpyToSymbol / cudaMemcpyFromSymbol. The HIP-compat header in
|
||||
ggml/src/ggml-cuda/vendors/hip.h already aliases the scalar cudaMemcpy
|
||||
family (cudaMemcpy, cudaMemcpyAsync, cudaMemcpy2DAsync, …) but is
|
||||
missing the symbol variants. Building with hipcc therefore fails with
|
||||
15+ "use of undeclared identifier 'cudaMemcpyToSymbol'" errors.
|
||||
|
||||
Add the two missing aliases alongside the existing memcpy block. HIP
|
||||
provides hipMemcpy{To,From}Symbol with the same signature as CUDA's
|
||||
equivalents, so this is a straight name substitution.
|
||||
|
||||
--- a/ggml/src/ggml-cuda/vendors/hip.h
|
||||
+++ b/ggml/src/ggml-cuda/vendors/hip.h
|
||||
@@ -85,6 +85,8 @@
|
||||
#define cudaMemcpyDeviceToDevice hipMemcpyDeviceToDevice
|
||||
#define cudaMemcpyDeviceToHost hipMemcpyDeviceToHost
|
||||
#define cudaMemcpyHostToDevice hipMemcpyHostToDevice
|
||||
+#define cudaMemcpyToSymbol hipMemcpyToSymbol
|
||||
+#define cudaMemcpyFromSymbol hipMemcpyFromSymbol
|
||||
#define cudaMemcpyKind hipMemcpyKind
|
||||
#define cudaMemset hipMemset
|
||||
#define cudaMemsetAsync hipMemsetAsync
|
||||
@@ -1,36 +0,0 @@
|
||||
Subject: [PATCH] ggml-cuda/fattn: pass WARP_SIZE to fwht128 __shfl_xor_sync calls
|
||||
|
||||
Same issue as the argmax top-K fix: two __shfl_xor_sync call sites in
|
||||
the FWHT-128 butterfly kernels (ggml_cuda_fwht128 and fwht128_store_half)
|
||||
use the 3-arg CUDA form and omit the `width` argument that the HIP
|
||||
function-like macro in vendors/hip.h:33 requires. Hipcc fails with:
|
||||
|
||||
fattn.cu:512: too few arguments provided to function-like macro
|
||||
invocation
|
||||
note: macro '__shfl_xor_sync' defined here:
|
||||
#define __shfl_xor_sync(mask, var, laneMask, width) \
|
||||
__shfl_xor(var, laneMask, width)
|
||||
|
||||
Add WARP_SIZE to both calls. CUDA codegen is unchanged (warpSize is the
|
||||
default); HIP now matches the macro arity.
|
||||
|
||||
--- a/ggml/src/ggml-cuda/fattn.cu
|
||||
+++ b/ggml/src/ggml-cuda/fattn.cu
|
||||
@@ -509,7 +509,7 @@
|
||||
// Intra-warp passes: shuffle xor with stride h, no smem, no sync.
|
||||
#pragma unroll
|
||||
for (int h = 1; h <= 16; h *= 2) {
|
||||
- const float other = __shfl_xor_sync(0xFFFFFFFF, val, h);
|
||||
+ const float other = __shfl_xor_sync(0xFFFFFFFF, val, h, WARP_SIZE);
|
||||
val = (tid & h) ? (other - val) : (val + other);
|
||||
}
|
||||
|
||||
@@ -533,7 +533,7 @@
|
||||
static __device__ __forceinline__ void fwht128_store_half(
|
||||
float val, half * dst_base) {
|
||||
const int tid = threadIdx.x;
|
||||
- const float neighbor = __shfl_xor_sync(0xFFFFFFFF, val, 1);
|
||||
+ const float neighbor = __shfl_xor_sync(0xFFFFFFFF, val, 1, WARP_SIZE);
|
||||
if ((tid & 1) == 0) {
|
||||
const half2 packed = __floats2half2_rn(val, neighbor);
|
||||
*((half2 *)(dst_base + tid)) = packed;
|
||||
@@ -1,65 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -ex
|
||||
|
||||
# Get the absolute current dir where the script is located
|
||||
CURDIR=$(dirname "$(realpath $0)")
|
||||
|
||||
cd /
|
||||
|
||||
echo "CPU info:"
|
||||
grep -e "model\sname" /proc/cpuinfo | head -1
|
||||
grep -e "flags" /proc/cpuinfo | head -1
|
||||
|
||||
BINARY=buun-llama-cpp-fallback
|
||||
|
||||
if grep -q -e "\savx\s" /proc/cpuinfo ; then
|
||||
echo "CPU: AVX found OK"
|
||||
if [ -e $CURDIR/buun-llama-cpp-avx ]; then
|
||||
BINARY=buun-llama-cpp-avx
|
||||
fi
|
||||
fi
|
||||
|
||||
if grep -q -e "\savx2\s" /proc/cpuinfo ; then
|
||||
echo "CPU: AVX2 found OK"
|
||||
if [ -e $CURDIR/buun-llama-cpp-avx2 ]; then
|
||||
BINARY=buun-llama-cpp-avx2
|
||||
fi
|
||||
fi
|
||||
|
||||
# Check avx 512
|
||||
if grep -q -e "\savx512f\s" /proc/cpuinfo ; then
|
||||
echo "CPU: AVX512F found OK"
|
||||
if [ -e $CURDIR/buun-llama-cpp-avx512 ]; then
|
||||
BINARY=buun-llama-cpp-avx512
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ -n "$LLAMACPP_GRPC_SERVERS" ]; then
|
||||
if [ -e $CURDIR/buun-llama-cpp-grpc ]; then
|
||||
BINARY=buun-llama-cpp-grpc
|
||||
fi
|
||||
fi
|
||||
|
||||
# Extend ld library path with the dir where this script is located/lib
|
||||
if [ "$(uname)" == "Darwin" ]; then
|
||||
export DYLD_LIBRARY_PATH=$CURDIR/lib:$DYLD_LIBRARY_PATH
|
||||
else
|
||||
export LD_LIBRARY_PATH=$CURDIR/lib:$LD_LIBRARY_PATH
|
||||
# Tell rocBLAS where to find TensileLibrary data (GPU kernel tuning files)
|
||||
if [ -d "$CURDIR/lib/rocblas/library" ]; then
|
||||
export ROCBLAS_TENSILE_LIBPATH=$CURDIR/lib/rocblas/library
|
||||
fi
|
||||
fi
|
||||
|
||||
# If there is a lib/ld.so, use it
|
||||
if [ -f $CURDIR/lib/ld.so ]; then
|
||||
echo "Using lib/ld.so"
|
||||
echo "Using binary: $BINARY"
|
||||
exec $CURDIR/lib/ld.so $CURDIR/$BINARY "$@"
|
||||
fi
|
||||
|
||||
echo "Using binary: $BINARY"
|
||||
exec $CURDIR/$BINARY "$@"
|
||||
|
||||
# We should never reach this point, however just in case we do, run fallback
|
||||
exec $CURDIR/buun-llama-cpp-fallback "$@"
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
IK_LLAMA_VERSION?=16996aeab772c69b6473597038b2ef0b85297e8b
|
||||
IK_LLAMA_VERSION?=3a945af45d45936341a45bbf7deda56776a4af26
|
||||
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
LLAMA_VERSION?=187a45637054881ecacf17f8e2f6f8f2ba7df1c7
|
||||
LLAMA_VERSION?=f53577432541bb9edc1588c4ef45c66bf07e4468
|
||||
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -642,6 +642,21 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
|
||||
} else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
|
||||
params.no_op_offload = false;
|
||||
}
|
||||
} else if (!strcmp(optname, "split_mode") || !strcmp(optname, "sm")) {
|
||||
// Accepts: none | layer | row | tensor (the latter requires a llama.cpp build
|
||||
// that includes ggml-org/llama.cpp#19378, FlashAttention enabled, and KV-cache
|
||||
// quantization disabled).
|
||||
if (optval != NULL) {
|
||||
if (optval_str == "none") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_NONE;
|
||||
} else if (optval_str == "layer") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
|
||||
} else if (optval_str == "row") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_ROW;
|
||||
} else if (optval_str == "tensor") {
|
||||
params.split_mode = LLAMA_SPLIT_MODE_TENSOR;
|
||||
}
|
||||
}
|
||||
} else if (!strcmp(optname, "kv_unified") || !strcmp(optname, "unified_kv")) {
|
||||
if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
|
||||
params.kv_unified = true;
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
|
||||
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||
TURBOQUANT_VERSION?=627ebbc6e27727bd4f65422d8aa60b13404993c8
|
||||
TURBOQUANT_VERSION?=11a241d0db78a68e0a5b99fe6f36de6683100f6a
|
||||
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# stablediffusion.cpp (ggml)
|
||||
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
|
||||
STABLEDIFFUSION_GGML_VERSION?=c97702e1057c2fe13a7074cd9069cb9dd6edc1bf
|
||||
STABLEDIFFUSION_GGML_VERSION?=b8bdffc19962be7e5a84bfefeb2e31bd885b571a
|
||||
|
||||
CMAKE_ARGS+=-DGGML_MAX_NAME=128
|
||||
|
||||
|
||||
@@ -139,7 +139,10 @@ func (w *Whisper) AudioTranscription(opts *pb.TranscriptRequest) (pb.TranscriptR
|
||||
// segment start/end conversion factor taken from https://github.com/ggml-org/whisper.cpp/blob/master/examples/cli/cli.cpp#L895
|
||||
s := CppGetSegmentStart(i) * (10000000)
|
||||
t := CppGetSegmentEnd(i) * (10000000)
|
||||
txt := strings.Clone(CppGetSegmentText(i))
|
||||
// whisper.cpp can emit bytes that aren't valid UTF-8 (e.g. a multibyte
|
||||
// codepoint split across token boundaries); protobuf string fields
|
||||
// reject those at marshal time. Scrub before the value escapes cgo.
|
||||
txt := strings.ToValidUTF8(strings.Clone(CppGetSegmentText(i)), "<22>")
|
||||
tokens := make([]int32, CppNTokens(i))
|
||||
|
||||
if opts.Diarize && CppGetSegmentSpeakerTurnNext(i) {
|
||||
|
||||
@@ -263,6 +263,8 @@
|
||||
amd: "rocm-vllm"
|
||||
intel: "intel-vllm"
|
||||
nvidia-cuda-12: "cuda12-vllm"
|
||||
nvidia-cuda-13: "cuda13-vllm"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm"
|
||||
cpu: "cpu-vllm"
|
||||
- &sglang
|
||||
name: "sglang"
|
||||
@@ -285,6 +287,7 @@
|
||||
amd: "rocm-sglang"
|
||||
intel: "intel-sglang"
|
||||
nvidia-cuda-12: "cuda12-sglang"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang"
|
||||
cpu: "cpu-sglang"
|
||||
- &vllm-omni
|
||||
name: "vllm-omni"
|
||||
@@ -311,6 +314,8 @@
|
||||
nvidia: "cuda12-vllm-omni"
|
||||
amd: "rocm-vllm-omni"
|
||||
nvidia-cuda-12: "cuda12-vllm-omni"
|
||||
nvidia-cuda-13: "cuda13-vllm-omni"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni"
|
||||
- &mlx
|
||||
name: "mlx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
|
||||
@@ -1608,6 +1613,20 @@
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant
|
||||
## whisper
|
||||
- !!merge <<: *whispercpp
|
||||
name: "whisper-development"
|
||||
capabilities:
|
||||
default: "cpu-whisper-development"
|
||||
nvidia: "cuda12-whisper-development"
|
||||
intel: "intel-sycl-f16-whisper-development"
|
||||
metal: "metal-whisper-development"
|
||||
amd: "rocm-whisper-development"
|
||||
vulkan: "vulkan-whisper-development"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-whisper-development"
|
||||
nvidia-cuda-13: "cuda13-whisper-development"
|
||||
nvidia-cuda-12: "cuda12-whisper-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-whisper-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-whisper-development"
|
||||
- !!merge <<: *whispercpp
|
||||
name: "nvidia-l4t-arm64-whisper"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-whisper"
|
||||
@@ -1814,12 +1833,25 @@
|
||||
nvidia: "cuda12-vllm-development"
|
||||
amd: "rocm-vllm-development"
|
||||
intel: "intel-vllm-development"
|
||||
nvidia-cuda-12: "cuda12-vllm-development"
|
||||
nvidia-cuda-13: "cuda13-vllm-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-development"
|
||||
cpu: "cpu-vllm-development"
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda12-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "rocm-vllm"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
|
||||
@@ -1840,6 +1872,16 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm
|
||||
- !!merge <<: *vllm
|
||||
name: "rocm-vllm-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
|
||||
@@ -1862,12 +1904,19 @@
|
||||
nvidia: "cuda12-sglang-development"
|
||||
amd: "rocm-sglang-development"
|
||||
intel: "intel-sglang-development"
|
||||
nvidia-cuda-12: "cuda12-sglang-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-sglang-development"
|
||||
cpu: "cpu-sglang-development"
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda12-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda13-nvidia-l4t-arm64-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "rocm-sglang"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-sglang"
|
||||
@@ -1888,6 +1937,11 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "cuda13-nvidia-l4t-arm64-sglang-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-sglang"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-sglang
|
||||
- !!merge <<: *sglang
|
||||
name: "rocm-sglang-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-sglang"
|
||||
@@ -1910,11 +1964,23 @@
|
||||
nvidia: "cuda12-vllm-omni-development"
|
||||
amd: "rocm-vllm-omni-development"
|
||||
nvidia-cuda-12: "cuda12-vllm-omni-development"
|
||||
nvidia-cuda-13: "cuda13-vllm-omni-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda12-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "rocm-vllm-omni"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm-omni"
|
||||
@@ -1925,6 +1991,16 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-13-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "cuda13-nvidia-l4t-arm64-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vllm-omni
|
||||
- !!merge <<: *vllm-omni
|
||||
name: "rocm-vllm-omni-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm-omni"
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cpu]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda12]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda13]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda12]
|
||||
@@ -1,2 +1,2 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
mlx[cuda13]
|
||||
@@ -1 +1 @@
|
||||
git+https://github.com/Blaizzy/mlx-vlm
|
||||
git+https://github.com/Blaizzy/mlx-vlm@v0.4.4
|
||||
@@ -23,6 +23,19 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
|
||||
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
|
||||
# wheel resolves cleanly. unsafe-best-match is required because the
|
||||
# jetson-ai-lab index lists transitive deps (e.g. decord) at older
|
||||
# versions only — without it uv refuses to fall through to PyPI for a
|
||||
# compatible wheel and resolution fails.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
|
||||
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
|
||||
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.
|
||||
|
||||
12
backend/python/sglang/requirements-l4t13.txt
Normal file
12
backend/python/sglang/requirements-l4t13.txt
Normal file
@@ -0,0 +1,12 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
# Drop the [all] extra: it pulls outlines/decord, and decord has no
|
||||
# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
|
||||
# only legacy cp35-cp37). With [all] uv backtracks through versions
|
||||
# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
|
||||
# so uv can't silently downgrade if a future resolution misfires.
|
||||
sglang>=0.5.0
|
||||
@@ -12,11 +12,15 @@ else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
# Handle l4t build profiles (Python 3.12, pip fallback) if needed
|
||||
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
|
||||
# unsafe-best-match is required on l4t13 because the jetson-ai-lab index
|
||||
# lists transitive deps at limited versions — without it uv pins to the
|
||||
# first matching index and fails to resolve a compatible wheel from PyPI.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
||||
@@ -26,7 +30,11 @@ fi
|
||||
# Install base requirements first
|
||||
installRequirements
|
||||
|
||||
# Install vllm based on build type
|
||||
# Install vllm based on build type. vllm-omni tracks vllm master from
|
||||
# source (cloned below) so we leave the upstream vllm dependency unpinned
|
||||
# — vllm 0.19+ ships cu130 wheels by default, which is what we want for
|
||||
# cublas13. Older cuda12/rocm/cpu paths still resolve a compatible wheel
|
||||
# from the relevant channel.
|
||||
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
|
||||
# ROCm
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
@@ -34,8 +42,26 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
|
||||
else
|
||||
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
|
||||
fi
|
||||
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
# JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel
|
||||
# at jetson-ai-lab. Version is unpinned: the index ships whatever build
|
||||
# matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through
|
||||
# to PyPI for transitive deps not present on the jetson-ai-lab index.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
else
|
||||
uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
fi
|
||||
elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then
|
||||
# vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm --torch-backend=auto
|
||||
else
|
||||
uv pip install vllm --torch-backend=auto
|
||||
fi
|
||||
elif [ "x${BUILD_TYPE}" == "xcublas" ] || [ "x${BUILD_TYPE}" == "x" ]; then
|
||||
# CUDA (default) or CPU
|
||||
# cuda12 / CPU — keep the 0.14.0 pin for compatibility with the existing
|
||||
# cuda12 vllm-omni image; bumping should be its own change.
|
||||
if [ "x${USE_PIP}" == "xtrue" ]; then
|
||||
pip install vllm==0.14.0 --torch-backend=auto
|
||||
else
|
||||
|
||||
5
backend/python/vllm-omni/requirements-cublas13.txt
Normal file
5
backend/python/vllm-omni/requirements-cublas13.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
accelerate
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
13
backend/python/vllm-omni/requirements-l4t13.txt
Normal file
13
backend/python/vllm-omni/requirements-l4t13.txt
Normal file
@@ -0,0 +1,13 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
bitsandbytes
|
||||
flash-attn
|
||||
diffusers
|
||||
librosa
|
||||
soundfile
|
||||
pillow
|
||||
numpy
|
||||
@@ -32,6 +32,22 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
|
||||
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
|
||||
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. unsafe-best-match
|
||||
# is required because the jetson-ai-lab index lists transitive deps at
|
||||
# limited versions — without it uv pins to the first matching index and
|
||||
# fails to resolve a compatible wheel from PyPI.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
||||
USE_PIP=true
|
||||
fi
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
|
||||
# requirements-cpu-after.txt and compiles vllm locally against the host's
|
||||
# actual CPU. Not used by default because it takes ~30-40 minutes, but
|
||||
|
||||
@@ -1,2 +1,9 @@
|
||||
https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
|
||||
# flash-attn wheels are ABI-tied to a specific torch version. vllm forces
|
||||
# torch==2.10.0 as a hard dep, but flash-attn 2.8.3 (latest) only ships
|
||||
# prebuilt wheels up to torch 2.8 — any wheel we pin here gets silently
|
||||
# broken when vllm upgrades torch during install, producing an undefined
|
||||
# libc10_cuda symbol at import time. FlashInfer (required by vllm) covers
|
||||
# attention, and rotary_embedding/common.py guards the flash_attn import
|
||||
# with find_spec(), so skipping flash-attn is safe and the only stable
|
||||
# choice until upstream ships a torch-2.10 wheel.
|
||||
vllm
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
accelerate
|
||||
torch==2.7.0
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
2
backend/python/vllm/requirements-cublas13-after.txt
Normal file
2
backend/python/vllm/requirements-cublas13-after.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
vllm
|
||||
5
backend/python/vllm/requirements-cublas13.txt
Normal file
5
backend/python/vllm/requirements-cublas13.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
accelerate
|
||||
torch
|
||||
transformers
|
||||
bitsandbytes
|
||||
2
backend/python/vllm/requirements-l4t13-after.txt
Normal file
2
backend/python/vllm/requirements-l4t13-after.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
vllm
|
||||
8
backend/python/vllm/requirements-l4t13.txt
Normal file
8
backend/python/vllm/requirements-l4t13.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
bitsandbytes
|
||||
flash-attn
|
||||
4
backend/rust/kokoros/Cargo.lock
generated
4
backend/rust/kokoros/Cargo.lock
generated
@@ -1867,9 +1867,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "rustls-webpki"
|
||||
version = "0.103.10"
|
||||
version = "0.103.13"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
|
||||
checksum = "61c429a8649f110dddef65e2a5ad240f747e85f7758a6bccc7e5777bd33f756e"
|
||||
dependencies = [
|
||||
"ring",
|
||||
"rustls-pki-types",
|
||||
|
||||
@@ -90,6 +90,14 @@ type WorkerCMD struct {
|
||||
RegistrationToken string `env:"LOCALAI_REGISTRATION_TOKEN" help:"Token for authenticating with the frontend" group:"registration"`
|
||||
HeartbeatInterval string `env:"LOCALAI_HEARTBEAT_INTERVAL" default:"10s" help:"Interval between heartbeats" group:"registration"`
|
||||
NodeLabels string `env:"LOCALAI_NODE_LABELS" help:"Comma-separated key=value labels for this node (e.g. tier=fast,gpu=a100)" group:"registration"`
|
||||
// MaxReplicasPerModel caps how many replicas of any one model can run on
|
||||
// this worker concurrently. Default 1 = historical single-replica
|
||||
// behavior. Set higher when a node has enough VRAM to host multiple
|
||||
// copies of the same model (e.g. a fat 128 GiB box running 4× of a
|
||||
// 24 GiB model for throughput). The auto-label `node.replica-slots=N`
|
||||
// is published so model schedulers can target high-capacity nodes via
|
||||
// the existing label selector.
|
||||
MaxReplicasPerModel int `env:"LOCALAI_MAX_REPLICAS_PER_MODEL" default:"1" help:"Max replicas of any single model on this worker. Default 1 preserves single-replica behavior; set higher to allow stacking replicas on a fat node." group:"registration"`
|
||||
|
||||
// NATS (required)
|
||||
NatsURL string `env:"LOCALAI_NATS_URL" required:"" help:"NATS server URL" group:"distributed"`
|
||||
@@ -567,22 +575,35 @@ func (s *backendSupervisor) getAddr(backend string) string {
|
||||
return ""
|
||||
}
|
||||
|
||||
// buildProcessKey is the supervisor's stable identifier for a backend gRPC
|
||||
// process. It includes the replica index so the same model can run multiple
|
||||
// processes on a worker simultaneously without colliding on the same map slot
|
||||
// or port. The "#N" suffix is purely internal — the controller never reads it.
|
||||
func buildProcessKey(modelID, backend string, replicaIndex int) string {
|
||||
base := modelID
|
||||
if base == "" {
|
||||
base = backend
|
||||
}
|
||||
return fmt.Sprintf("%s#%d", base, replicaIndex)
|
||||
}
|
||||
|
||||
// installBackend handles the backend.install flow:
|
||||
// 1. If already running for this model, return existing address
|
||||
// 1. If already running for this (model, replica) slot, return existing address
|
||||
// 2. Install backend from gallery (if not already installed)
|
||||
// 3. Find backend binary
|
||||
// 4. Start gRPC process on a new port
|
||||
// Returns the gRPC address of the backend process.
|
||||
//
|
||||
// ProcessKey includes the replica index so a worker with MaxReplicasPerModel>1
|
||||
// can host multiple processes for the same model on distinct ports. Old
|
||||
// controllers (no replica_index in the request) implicitly target replica 0,
|
||||
// which preserves single-replica behavior.
|
||||
func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest) (string, error) {
|
||||
// Process key: use ModelID if provided (per-model process), else backend name
|
||||
processKey := req.ModelID
|
||||
if processKey == "" {
|
||||
processKey = req.Backend
|
||||
}
|
||||
processKey := buildProcessKey(req.ModelID, req.Backend, int(req.ReplicaIndex))
|
||||
|
||||
// If already running for this model, return its address
|
||||
// If already running for this model+replica, return its address
|
||||
if addr := s.getAddr(processKey); addr != "" {
|
||||
xlog.Info("Backend already running for model", "backend", req.Backend, "model", req.ModelID, "addr", addr)
|
||||
xlog.Info("Backend already running for model replica", "backend", req.Backend, "model", req.ModelID, "replica", req.ReplicaIndex, "addr", addr)
|
||||
return addr, nil
|
||||
}
|
||||
|
||||
@@ -886,13 +907,18 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
|
||||
totalVRAM, _ := xsysinfo.TotalAvailableVRAM()
|
||||
gpuVendor, _ := xsysinfo.DetectGPUVendor()
|
||||
|
||||
maxReplicas := cmd.MaxReplicasPerModel
|
||||
if maxReplicas < 1 {
|
||||
maxReplicas = 1
|
||||
}
|
||||
body := map[string]any{
|
||||
"name": nodeName,
|
||||
"address": cmd.advertiseAddr(),
|
||||
"http_address": cmd.advertiseHTTPAddr(),
|
||||
"total_vram": totalVRAM,
|
||||
"available_vram": totalVRAM, // initially all VRAM is available
|
||||
"gpu_vendor": gpuVendor,
|
||||
"name": nodeName,
|
||||
"address": cmd.advertiseAddr(),
|
||||
"http_address": cmd.advertiseHTTPAddr(),
|
||||
"total_vram": totalVRAM,
|
||||
"available_vram": totalVRAM, // initially all VRAM is available
|
||||
"gpu_vendor": gpuVendor,
|
||||
"max_replicas_per_model": maxReplicas,
|
||||
}
|
||||
|
||||
// If no GPU detected, report system RAM so the scheduler/UI has capacity info
|
||||
@@ -906,39 +932,40 @@ func (cmd *WorkerCMD) registrationBody() map[string]any {
|
||||
body["token"] = cmd.RegistrationToken
|
||||
}
|
||||
|
||||
// Parse and add static node labels
|
||||
// Parse and add static node labels. Always include the auto-label
|
||||
// `node.replica-slots=N` so AND-selectors in ModelSchedulingConfig can
|
||||
// target high-capacity nodes (e.g. {"node.replica-slots":"4"}).
|
||||
labels := make(map[string]string)
|
||||
if cmd.NodeLabels != "" {
|
||||
labels := make(map[string]string)
|
||||
for _, pair := range strings.Split(cmd.NodeLabels, ",") {
|
||||
pair = strings.TrimSpace(pair)
|
||||
if k, v, ok := strings.Cut(pair, "="); ok {
|
||||
labels[strings.TrimSpace(k)] = strings.TrimSpace(v)
|
||||
}
|
||||
}
|
||||
if len(labels) > 0 {
|
||||
body["labels"] = labels
|
||||
}
|
||||
}
|
||||
labels["node.replica-slots"] = strconv.Itoa(maxReplicas)
|
||||
body["labels"] = labels
|
||||
|
||||
return body
|
||||
}
|
||||
|
||||
// heartbeatBody returns the current VRAM/RAM stats for heartbeat payloads.
|
||||
//
|
||||
// When aggregate VRAM usage is unknown (no GPU, or temporary detection
|
||||
// failure), we deliberately OMIT available_vram so the frontend keeps its
|
||||
// last good value — overwriting with 0 makes the UI show the node as "fully
|
||||
// used", while reporting total-as-available lies to the scheduler about
|
||||
// free capacity.
|
||||
func (cmd *WorkerCMD) heartbeatBody() map[string]any {
|
||||
var availVRAM uint64
|
||||
body := map[string]any{}
|
||||
aggregate := xsysinfo.GetGPUAggregateInfo()
|
||||
if aggregate.TotalVRAM > 0 {
|
||||
availVRAM = aggregate.FreeVRAM
|
||||
} else {
|
||||
// Fallback: report total as available (no usage tracking possible)
|
||||
availVRAM, _ = xsysinfo.TotalAvailableVRAM()
|
||||
body["available_vram"] = aggregate.FreeVRAM
|
||||
}
|
||||
|
||||
body := map[string]any{
|
||||
"available_vram": availVRAM,
|
||||
}
|
||||
|
||||
// If no GPU, report system RAM usage instead
|
||||
// CPU-only workers (or workers that lost GPU visibility momentarily):
|
||||
// report system RAM so the scheduler still has capacity info.
|
||||
if aggregate.TotalVRAM == 0 {
|
||||
if ramInfo, err := xsysinfo.GetSystemRAMInfo(); err == nil {
|
||||
body["available_ram"] = ramInfo.Available
|
||||
|
||||
70
core/cli/worker_replica_test.go
Normal file
70
core/cli/worker_replica_test.go
Normal file
@@ -0,0 +1,70 @@
|
||||
package cli
|
||||
|
||||
import (
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("Worker per-replica process keying", func() {
|
||||
Describe("buildProcessKey", func() {
|
||||
// Pin the supervisor's keying contract: distinct replica indexes for
|
||||
// the same modelID produce distinct process keys, so the supervisor
|
||||
// map can hold multiple processes for one model. Dropping the suffix
|
||||
// would re-introduce the original flap (one model, one slot, churn).
|
||||
DescribeTable("produces stable, distinct keys",
|
||||
func(modelID, backend string, replica int, want string) {
|
||||
Expect(buildProcessKey(modelID, backend, replica)).To(Equal(want))
|
||||
},
|
||||
Entry("modelID present, replica 0", "Qwen3-35B", "llama-cpp", 0, "Qwen3-35B#0"),
|
||||
Entry("modelID present, replica 1", "Qwen3-35B", "llama-cpp", 1, "Qwen3-35B#1"),
|
||||
Entry("falls back to backend when modelID empty", "", "llama-cpp", 0, "llama-cpp#0"),
|
||||
Entry("backend fallback with replica 2", "", "llama-cpp", 2, "llama-cpp#2"),
|
||||
)
|
||||
|
||||
It("makes replicas distinguishable", func() {
|
||||
r0 := buildProcessKey("model-a", "llama-cpp", 0)
|
||||
r1 := buildProcessKey("model-a", "llama-cpp", 1)
|
||||
Expect(r0).ToNot(Equal(r1), "replicas of the same model must produce distinct keys")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("registrationBody", func() {
|
||||
It("includes max_replicas_per_model and the auto-label", func() {
|
||||
cmd := &WorkerCMD{
|
||||
Addr: "worker.example.com:50051",
|
||||
MaxReplicasPerModel: 4,
|
||||
}
|
||||
body := cmd.registrationBody()
|
||||
|
||||
Expect(body).To(HaveKey("max_replicas_per_model"))
|
||||
Expect(body["max_replicas_per_model"]).To(Equal(4))
|
||||
|
||||
labels, ok := body["labels"].(map[string]string)
|
||||
Expect(ok).To(BeTrue(), "labels must be present so selectors can target the slot count")
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "4"))
|
||||
})
|
||||
|
||||
It("coerces zero/unset MaxReplicasPerModel to 1", func() {
|
||||
cmd := &WorkerCMD{Addr: "worker.example.com:50051"}
|
||||
body := cmd.registrationBody()
|
||||
Expect(body["max_replicas_per_model"]).To(Equal(1),
|
||||
"unset must default to single-replica behavior, not capacity 0")
|
||||
|
||||
labels := body["labels"].(map[string]string)
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "1"))
|
||||
})
|
||||
|
||||
It("preserves user-provided labels alongside the auto-label", func() {
|
||||
cmd := &WorkerCMD{
|
||||
Addr: "worker.example.com:50051",
|
||||
MaxReplicasPerModel: 2,
|
||||
NodeLabels: "tier=fast,gpu=a100",
|
||||
}
|
||||
body := cmd.registrationBody()
|
||||
labels := body["labels"].(map[string]string)
|
||||
Expect(labels).To(HaveKeyWithValue("tier", "fast"))
|
||||
Expect(labels).To(HaveKeyWithValue("gpu", "a100"))
|
||||
Expect(labels).To(HaveKeyWithValue("node.replica-slots", "2"))
|
||||
})
|
||||
})
|
||||
})
|
||||
@@ -37,14 +37,6 @@ var CacheTypeOptions = []FieldOption{
|
||||
{Value: "q4_1", Label: "Q4_1"},
|
||||
{Value: "q5_0", Label: "Q5_0"},
|
||||
{Value: "q5_1", Label: "Q5_1"},
|
||||
// TurboQuant KV-cache types — accepted by the turboquant and
|
||||
// buun-llama-cpp fork backends; stock llama-cpp will reject them at load.
|
||||
{Value: "turbo2", Label: "Turbo2 (TurboQuant)"},
|
||||
{Value: "turbo3", Label: "Turbo3 (TurboQuant)"},
|
||||
{Value: "turbo4", Label: "Turbo4 (TurboQuant)"},
|
||||
// Trellis-Coded Quantization variants — buun-llama-cpp only.
|
||||
{Value: "turbo2_tcq", Label: "Turbo2 TCQ (buun-llama-cpp)"},
|
||||
{Value: "turbo3_tcq", Label: "Turbo3 TCQ (buun-llama-cpp)"},
|
||||
}
|
||||
|
||||
var DiffusersPipelineOptions = []FieldOption{
|
||||
|
||||
@@ -34,7 +34,6 @@ func (i *LlamaCPPImporter) AdditionalBackends() []KnownBackendEntry {
|
||||
return []KnownBackendEntry{
|
||||
{Name: "ik-llama-cpp", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with ik-quants"},
|
||||
{Name: "turboquant", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with TurboQuant optimizations"},
|
||||
{Name: "buun-llama-cpp", Modality: "text", Description: "GGUF drop-in replacement for llama-cpp with DFlash speculative decoding and TurboQuant/TCQ KV-cache quantization"},
|
||||
}
|
||||
}
|
||||
|
||||
@@ -128,7 +127,7 @@ func (i *LlamaCPPImporter) Import(details Details) (gallery.ModelConfig, error)
|
||||
backend := "llama-cpp"
|
||||
if b, ok := preferencesMap["backend"].(string); ok {
|
||||
switch b {
|
||||
case "ik-llama-cpp", "turboquant", "buun-llama-cpp":
|
||||
case "ik-llama-cpp", "turboquant":
|
||||
backend = b
|
||||
}
|
||||
}
|
||||
|
||||
@@ -181,23 +181,6 @@ var _ = Describe("LlamaCPPImporter", func() {
|
||||
Expect(modelConfig.Files[0].Filename).To(Equal("my-model.gguf"))
|
||||
})
|
||||
|
||||
It("swaps the emitted backend to buun-llama-cpp when preferred", func() {
|
||||
preferences := json.RawMessage(`{"backend": "buun-llama-cpp"}`)
|
||||
details := Details{
|
||||
URI: "https://example.com/my-model.gguf",
|
||||
Preferences: preferences,
|
||||
}
|
||||
|
||||
modelConfig, err := importer.Import(details)
|
||||
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(modelConfig.ConfigFile).To(ContainSubstring("backend: buun-llama-cpp"), fmt.Sprintf("Model config: %+v", modelConfig))
|
||||
Expect(modelConfig.ConfigFile).NotTo(ContainSubstring("backend: llama-cpp\n"), fmt.Sprintf("Model config: %+v", modelConfig))
|
||||
Expect(modelConfig.ConfigFile).To(ContainSubstring("model: my-model.gguf"), fmt.Sprintf("Model config: %+v", modelConfig))
|
||||
Expect(len(modelConfig.Files)).To(Equal(1))
|
||||
Expect(modelConfig.Files[0].Filename).To(Equal("my-model.gguf"))
|
||||
})
|
||||
|
||||
It("keeps backend: llama-cpp for unknown backend preferences", func() {
|
||||
// Unknown backend values must not leak into the emitted YAML —
|
||||
// we only honour the two curated drop-in replacements.
|
||||
@@ -392,7 +375,7 @@ var _ = Describe("LlamaCPPImporter", func() {
|
||||
})
|
||||
|
||||
Context("AdditionalBackends", func() {
|
||||
It("advertises ik-llama-cpp, turboquant, and buun-llama-cpp as drop-in replacements", func() {
|
||||
It("advertises ik-llama-cpp and turboquant as drop-in replacements", func() {
|
||||
entries := importer.AdditionalBackends()
|
||||
|
||||
names := make([]string, 0, len(entries))
|
||||
@@ -401,7 +384,7 @@ var _ = Describe("LlamaCPPImporter", func() {
|
||||
names = append(names, e.Name)
|
||||
byName[e.Name] = e
|
||||
}
|
||||
Expect(names).To(ConsistOf("ik-llama-cpp", "turboquant", "buun-llama-cpp"))
|
||||
Expect(names).To(ConsistOf("ik-llama-cpp", "turboquant"))
|
||||
|
||||
ik := byName["ik-llama-cpp"]
|
||||
Expect(ik.Modality).To(Equal("text"))
|
||||
@@ -410,10 +393,6 @@ var _ = Describe("LlamaCPPImporter", func() {
|
||||
tq := byName["turboquant"]
|
||||
Expect(tq.Modality).To(Equal("text"))
|
||||
Expect(tq.Description).NotTo(BeEmpty())
|
||||
|
||||
bn := byName["buun-llama-cpp"]
|
||||
Expect(bn.Modality).To(Equal("text"))
|
||||
Expect(bn.Description).NotTo(BeEmpty())
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -98,7 +98,7 @@ func (mgs *BackendEndpointService) GetAllStatusEndpoint() echo.HandlerFunc {
|
||||
// @Param request body GalleryBackend true "query params"
|
||||
// @Success 200 {object} schema.BackendResponse "Response"
|
||||
// @Router /backends/apply [post]
|
||||
func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
func (mgs *BackendEndpointService) ApplyBackendEndpoint(systemState *system.SystemState) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
input := new(GalleryBackend)
|
||||
// Get input data from the request body
|
||||
@@ -106,6 +106,18 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
return err
|
||||
}
|
||||
|
||||
// In distributed mode, refuse to fan out a hardware-specific build to
|
||||
// every node — a CPU build landing on a GPU cluster is almost always
|
||||
// wrong, and the silent footgun is exactly what this guard exists for.
|
||||
// Auto-resolving (meta) backends are fine because each node picks its
|
||||
// own variant. Tooling can recover by hitting
|
||||
// POST /api/nodes/{id}/backends/install per target node.
|
||||
if mgs.backendApplier.BackendManager().IsDistributed() && input.ID != "" {
|
||||
if guard := concreteFanOutGuard(c, mgs.galleries, systemState, input.ID); guard != nil {
|
||||
return guard
|
||||
}
|
||||
}
|
||||
|
||||
uuid, err := uuid.NewUUID()
|
||||
if err != nil {
|
||||
return err
|
||||
@@ -120,6 +132,66 @@ func (mgs *BackendEndpointService) ApplyBackendEndpoint() echo.HandlerFunc {
|
||||
}
|
||||
}
|
||||
|
||||
// concreteFanOutGuard returns a 409 response if the requested backend is a
|
||||
// hardware-specific build (not auto-resolving / meta) and we are in
|
||||
// distributed mode. It looks up the backend in the configured galleries; if
|
||||
// the lookup itself fails (gallery unreachable, name not found), the guard
|
||||
// stays out of the way and lets the install enqueue normally — a missing
|
||||
// name will surface from the worker as a clearer error than the guard could
|
||||
// produce here. The response body deliberately speaks human, with `code` and
|
||||
// `meta_alternative` as the programmatic contract for tooling.
|
||||
func concreteFanOutGuard(c echo.Context, galleries []config.Gallery, systemState *system.SystemState, backendID string) error {
|
||||
// Use the unfiltered listing because in distributed mode the frontend's
|
||||
// hardware is irrelevant — the install targets workers, not us — and the
|
||||
// filtered list would hide variants that don't match the frontend host
|
||||
// (e.g. a CUDA build on a CPU-only frontend), preventing the guard from
|
||||
// firing for exactly the cases it's meant to protect against.
|
||||
available, err := gallery.AvailableBackendsUnfiltered(galleries, systemState)
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
requested := available.FindByName(backendID)
|
||||
if requested == nil || requested.IsMeta() {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Try to find an auto-resolving (meta) backend that has this concrete
|
||||
// variant in its CapabilitiesMap, so we can suggest it as a one-shot
|
||||
// alternative. Optional — empty string is fine if no parent exists.
|
||||
metaAlternative := ""
|
||||
for _, b := range available {
|
||||
if !b.IsMeta() {
|
||||
continue
|
||||
}
|
||||
for _, concrete := range b.CapabilitiesMap {
|
||||
if concrete == backendID {
|
||||
metaAlternative = b.Name
|
||||
break
|
||||
}
|
||||
}
|
||||
if metaAlternative != "" {
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
msg := fmt.Sprintf(
|
||||
"Backend %q is a hardware-specific build and won't run correctly on every node in this cluster. In distributed mode, install it on specific nodes:\n\n POST /api/nodes/{node_id}/backends/install\n {\"backend\": %q}",
|
||||
backendID, backendID,
|
||||
)
|
||||
if metaAlternative != "" {
|
||||
msg += fmt.Sprintf(
|
||||
"\n\nTo install across all nodes, use the auto-resolving backend %q — each node picks its own variant based on its hardware.",
|
||||
metaAlternative,
|
||||
)
|
||||
}
|
||||
|
||||
return c.JSON(409, map[string]any{
|
||||
"error": msg,
|
||||
"code": "concrete_backend_requires_target",
|
||||
"meta_alternative": metaAlternative,
|
||||
})
|
||||
}
|
||||
|
||||
// DeleteBackendEndpoint lets delete backends from a LocalAI instance
|
||||
// @Summary delete backends from LocalAI.
|
||||
// @Tags backends
|
||||
|
||||
@@ -73,6 +73,10 @@ type RegisterNodeRequest struct {
|
||||
AvailableRAM uint64 `json:"available_ram,omitempty"`
|
||||
GPUVendor string `json:"gpu_vendor,omitempty"`
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
// MaxReplicasPerModel is the per-node cap on replicas of any single model.
|
||||
// Workers older than this field omit it; we coerce 0 → 1 below to preserve
|
||||
// historical single-replica behavior.
|
||||
MaxReplicasPerModel int `json:"max_replicas_per_model,omitempty"`
|
||||
}
|
||||
|
||||
// RegisterNodeEndpoint registers a new backend node.
|
||||
@@ -131,17 +135,26 @@ func RegisterNodeEndpoint(registry *nodes.NodeRegistry, expectedToken string, au
|
||||
tokenHash = hex.EncodeToString(h[:])
|
||||
}
|
||||
|
||||
// Coerce 0 → 1 for backward compat with workers that don't send the field.
|
||||
// GORM's `default:1` only fires for a missing column; once Go zero-values
|
||||
// reach the struct field they're written as 0 unless explicitly set here.
|
||||
maxReplicasPerModel := req.MaxReplicasPerModel
|
||||
if maxReplicasPerModel < 1 {
|
||||
maxReplicasPerModel = 1
|
||||
}
|
||||
|
||||
node := &nodes.BackendNode{
|
||||
Name: req.Name,
|
||||
NodeType: nodeType,
|
||||
Address: req.Address,
|
||||
HTTPAddress: req.HTTPAddress,
|
||||
TokenHash: tokenHash,
|
||||
TotalVRAM: req.TotalVRAM,
|
||||
AvailableVRAM: req.AvailableVRAM,
|
||||
TotalRAM: req.TotalRAM,
|
||||
AvailableRAM: req.AvailableRAM,
|
||||
GPUVendor: req.GPUVendor,
|
||||
Name: req.Name,
|
||||
NodeType: nodeType,
|
||||
Address: req.Address,
|
||||
HTTPAddress: req.HTTPAddress,
|
||||
TokenHash: tokenHash,
|
||||
TotalVRAM: req.TotalVRAM,
|
||||
AvailableVRAM: req.AvailableVRAM,
|
||||
TotalRAM: req.TotalRAM,
|
||||
AvailableRAM: req.AvailableRAM,
|
||||
GPUVendor: req.GPUVendor,
|
||||
MaxReplicasPerModel: maxReplicasPerModel,
|
||||
}
|
||||
|
||||
ctx := c.Request().Context()
|
||||
@@ -363,6 +376,9 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
}
|
||||
|
||||
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
|
||||
// Backend can be either a gallery ID (resolved against BackendGalleries) or a
|
||||
// direct URI install (URI + Name + optional Alias) — same shape as the
|
||||
// standalone /api/backends/install-external path, just scoped to one node.
|
||||
func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
if unloader == nil {
|
||||
@@ -372,17 +388,27 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
|
||||
var req struct {
|
||||
Backend string `json:"backend"`
|
||||
BackendGalleries string `json:"backend_galleries,omitempty"`
|
||||
URI string `json:"uri,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Alias string `json:"alias,omitempty"`
|
||||
}
|
||||
if err := c.Bind(&req); err != nil || req.Backend == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
|
||||
if err := c.Bind(&req); err != nil {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
|
||||
}
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
|
||||
// Either a gallery backend name or a direct URI must be supplied.
|
||||
if req.Backend == "" && req.URI == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
|
||||
}
|
||||
// Admin-driven backend install: not tied to a specific replica slot
|
||||
// (no model is being loaded). Pass replica 0 to match the worker's
|
||||
// admin process-key convention (`backend#0`).
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
|
||||
if err != nil {
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
|
||||
}
|
||||
if !reply.Success {
|
||||
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "error", reply.Error)
|
||||
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
|
||||
@@ -457,8 +483,8 @@ func UnloadModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
|
||||
xlog.Error("Failed to stop backend after model unload", "node", nodeID, "model", req.ModelName, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "model unloaded but backend stop failed"))
|
||||
}
|
||||
// Remove from registry
|
||||
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
|
||||
// Remove every replica of this model on the node from the registry.
|
||||
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "model unloaded"})
|
||||
}
|
||||
}
|
||||
@@ -484,7 +510,7 @@ func DeleteModelOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes
|
||||
// Non-fatal — backend process may not be running
|
||||
xlog.Warn("StopBackend failed during model deletion (non-fatal)", "node", nodeID, "model", req.ModelName, "error", err)
|
||||
}
|
||||
registry.RemoveNodeModel(c.Request().Context(), nodeID, req.ModelName)
|
||||
registry.RemoveAllNodeModelReplicas(c.Request().Context(), nodeID, req.ModelName)
|
||||
return c.JSON(http.StatusOK, map[string]string{"message": "model deleted from node"})
|
||||
}
|
||||
}
|
||||
@@ -659,6 +685,78 @@ func GetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
}
|
||||
}
|
||||
|
||||
// UpdateMaxReplicasPerModelRequest is the body for the per-node replica cap endpoint.
|
||||
type UpdateMaxReplicasPerModelRequest struct {
|
||||
// Value is the new per-model replica cap on this node. Must be >= 1.
|
||||
Value int `json:"value"`
|
||||
}
|
||||
|
||||
// UpdateMaxReplicasPerModelEndpoint sets the per-node cap on how many replicas
|
||||
// of any one model can be loaded concurrently. The corresponding
|
||||
// `node.replica-slots` auto-label is refreshed so existing AND-selectors keep
|
||||
// matching, and any unsatisfiable scheduling cooldowns are cleared so the
|
||||
// reconciler retries on the next tick.
|
||||
//
|
||||
// This is a transient admin override — a worker re-registration restores the
|
||||
// value the worker was started with (--max-replicas-per-model). For permanent
|
||||
// fleet changes, change the worker flag.
|
||||
//
|
||||
// @Summary Update a node's max replicas per model
|
||||
// @Tags Nodes
|
||||
// @Param id path string true "Node ID"
|
||||
// @Param request body UpdateMaxReplicasPerModelRequest true "New value"
|
||||
// @Success 200 {object} map[string]int
|
||||
// @Failure 400 {object} map[string]any "value must be >= 1"
|
||||
// @Failure 404 {object} map[string]any "node not found"
|
||||
// @Router /api/nodes/{id}/max-replicas-per-model [put]
|
||||
func UpdateMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
ctx := c.Request().Context()
|
||||
nodeID := c.Param("id")
|
||||
if _, err := registry.Get(ctx, nodeID); err != nil {
|
||||
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
|
||||
}
|
||||
var req UpdateMaxReplicasPerModelRequest
|
||||
if err := c.Bind(&req); err != nil {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
|
||||
}
|
||||
if req.Value < 1 {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "value must be >= 1"))
|
||||
}
|
||||
if err := registry.UpdateMaxReplicasPerModel(ctx, nodeID, req.Value); err != nil {
|
||||
xlog.Error("Failed to update max_replicas_per_model", "node", nodeID, "value", req.Value, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to update max replicas per model"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]int{"max_replicas_per_model": req.Value})
|
||||
}
|
||||
}
|
||||
|
||||
// ResetMaxReplicasPerModelEndpoint clears the admin override on a node, so
|
||||
// the next worker re-registration is allowed to update the value from its
|
||||
// CLI flag again. The current value is left in place until the worker calls
|
||||
// register.
|
||||
//
|
||||
// @Summary Reset a node's max replicas per model to the worker default
|
||||
// @Tags Nodes
|
||||
// @Param id path string true "Node ID"
|
||||
// @Success 200 {object} map[string]bool
|
||||
// @Failure 404 {object} map[string]any "node not found"
|
||||
// @Router /api/nodes/{id}/max-replicas-per-model [delete]
|
||||
func ResetMaxReplicasPerModelEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
ctx := c.Request().Context()
|
||||
nodeID := c.Param("id")
|
||||
if _, err := registry.Get(ctx, nodeID); err != nil {
|
||||
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
|
||||
}
|
||||
if err := registry.ResetMaxReplicasPerModel(ctx, nodeID); err != nil {
|
||||
xlog.Error("Failed to reset max_replicas_per_model override", "node", nodeID, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to reset override"))
|
||||
}
|
||||
return c.JSON(http.StatusOK, map[string]bool{"reset": true})
|
||||
}
|
||||
}
|
||||
|
||||
// SetNodeLabelsEndpoint replaces all labels for a node.
|
||||
func SetNodeLabelsEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
|
||||
@@ -1,29 +1,32 @@
|
||||
import { test, expect } from '@playwright/test'
|
||||
|
||||
test.describe('Manage Page - Backend Logs Link', () => {
|
||||
test('models table shows terminal icon for logs', async ({ page }) => {
|
||||
test('row action menu exposes Backend logs entry with terminal icon', async ({ page }) => {
|
||||
await page.goto('/app/manage')
|
||||
// Wait for models to load
|
||||
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
// Check for terminal icon (backend logs link)
|
||||
const terminalIcon = page.locator('a[title="Backend logs"] i.fa-terminal')
|
||||
await expect(terminalIcon.first()).toBeVisible()
|
||||
// Row actions live behind the kebab (ActionMenu) — open the first row's menu.
|
||||
const trigger = page.locator('button.action-menu__trigger').first()
|
||||
await expect(trigger).toBeVisible()
|
||||
await trigger.click()
|
||||
|
||||
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
|
||||
await expect(logsItem).toBeVisible()
|
||||
await expect(logsItem.locator('i.fa-terminal')).toBeVisible()
|
||||
})
|
||||
|
||||
test('terminal icon links to backend-logs page', async ({ page }) => {
|
||||
test('Backend logs menu item navigates to backend-logs page', async ({ page }) => {
|
||||
await page.goto('/app/manage')
|
||||
await expect(page.locator('.table')).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
const logsLink = page.locator('a[title="Backend logs"]').first()
|
||||
await expect(logsLink).toBeVisible()
|
||||
const trigger = page.locator('button.action-menu__trigger').first()
|
||||
await expect(trigger).toBeVisible()
|
||||
await trigger.click()
|
||||
|
||||
// Link uses href="#" with onClick for navigation
|
||||
const href = await logsLink.getAttribute('href')
|
||||
expect(href).toBe('#')
|
||||
const logsItem = page.getByRole('menuitem', { name: 'Backend logs' })
|
||||
await expect(logsItem).toBeVisible()
|
||||
await logsItem.click()
|
||||
|
||||
// Click and verify navigation
|
||||
await logsLink.click()
|
||||
await expect(page).toHaveURL(/\/app\/backend-logs\//)
|
||||
})
|
||||
})
|
||||
|
||||
166
core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
Normal file
166
core/http/react-ui/e2e/nodes-per-node-backend-actions.spec.js
Normal file
@@ -0,0 +1,166 @@
|
||||
import { test, expect } from '@playwright/test'
|
||||
|
||||
// These specs cover the per-node backend row in the Nodes page:
|
||||
// - the upgrade affordance is self-explanatory (icon + tooltip)
|
||||
// - a delete affordance is present and goes through ConfirmDialog
|
||||
//
|
||||
// We mock the distributed-mode API so the tests can run against the
|
||||
// standalone ui-test-server without spinning up workers/NATS.
|
||||
|
||||
const NODE_ID = 'test-node-1'
|
||||
const NODE_NAME = 'worker-test'
|
||||
const BACKEND_NAME = 'cuda12-vllm-development'
|
||||
|
||||
async function mockDistributedNodes(page, { onDelete } = {}) {
|
||||
await page.route('**/api/nodes', (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify([
|
||||
{
|
||||
id: NODE_ID,
|
||||
name: NODE_NAME,
|
||||
node_type: 'backend',
|
||||
address: '10.0.0.1:50051',
|
||||
http_address: '10.0.0.1:8090',
|
||||
status: 'healthy',
|
||||
total_vram: 0,
|
||||
available_vram: 0,
|
||||
total_ram: 8_000_000_000,
|
||||
available_ram: 4_000_000_000,
|
||||
gpu_vendor: '',
|
||||
last_heartbeat: new Date().toISOString(),
|
||||
created_at: new Date().toISOString(),
|
||||
updated_at: new Date().toISOString(),
|
||||
},
|
||||
]),
|
||||
})
|
||||
})
|
||||
|
||||
await page.route('**/api/nodes/scheduling', (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: '[]',
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/models`, (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: '[]',
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/backends`, (route) => {
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify([
|
||||
{
|
||||
name: BACKEND_NAME,
|
||||
is_system: false,
|
||||
is_meta: false,
|
||||
installed_at: new Date().toISOString(),
|
||||
},
|
||||
]),
|
||||
})
|
||||
})
|
||||
|
||||
await page.route(`**/api/nodes/${NODE_ID}/backends/delete`, async (route) => {
|
||||
if (onDelete) {
|
||||
await onDelete(route)
|
||||
}
|
||||
route.fulfill({
|
||||
status: 200,
|
||||
contentType: 'application/json',
|
||||
body: JSON.stringify({ message: 'backend deleted' }),
|
||||
})
|
||||
})
|
||||
}
|
||||
|
||||
async function expandNodeAndWaitForBackends(page) {
|
||||
await page.goto('/app/nodes')
|
||||
// Click the row to expand it. The chevron toggle and the row both work,
|
||||
// but clicking the name cell is the most user-like.
|
||||
await page.getByText(NODE_NAME).first().click()
|
||||
// Backends, Capacity and Labels live behind a "Manage" <details>
|
||||
// disclosure (the drawer was distilled to keep at-a-glance content
|
||||
// lean — see distill refactor in the multi-replica branch). Open it
|
||||
// by clicking the summary inside the .node-manage scope so the
|
||||
// per-node backend table is in the DOM before assertions run.
|
||||
await page.locator('.node-manage > summary').first().click()
|
||||
await expect(page.getByRole('cell', { name: BACKEND_NAME, exact: true })).toBeVisible({ timeout: 10_000 })
|
||||
}
|
||||
|
||||
test.describe('Nodes page — per-node backend actions', () => {
|
||||
test('upgrade affordance is self-explanatory (not "Reinstall backend" with a sync icon)', async ({ page }) => {
|
||||
await mockDistributedNodes(page)
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
// Negative: the old, ambiguous wording must not be used.
|
||||
await expect(page.locator('button[title="Reinstall backend"]')).toHaveCount(0)
|
||||
await expect(page.locator('button[title="Reinstall backend"] i.fa-sync-alt')).toHaveCount(0)
|
||||
|
||||
// Positive: a self-explanatory upgrade affordance is rendered next to the
|
||||
// backend row. We accept either an arrow-up or arrows-rotate glyph; both
|
||||
// map to "upgrade" semantics in FontAwesome 6 unambiguously.
|
||||
const upgradeBtn = page.locator('button[title="Upgrade backend on this node"]')
|
||||
await expect(upgradeBtn).toBeVisible()
|
||||
const iconClass = await upgradeBtn.locator('i').getAttribute('class')
|
||||
expect(iconClass).toMatch(/fa-(arrow-up|arrows-rotate|up-long)/)
|
||||
})
|
||||
|
||||
test('per-node backend row shows a delete (trash) button next to upgrade', async ({ page }) => {
|
||||
await mockDistributedNodes(page)
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
const deleteBtn = page.locator('button[title="Delete backend from this node"]')
|
||||
await expect(deleteBtn).toBeVisible()
|
||||
await expect(deleteBtn.locator('i.fa-trash')).toBeVisible()
|
||||
})
|
||||
|
||||
test('clicking delete opens the confirm dialog and POSTs to the per-node delete endpoint', async ({ page }) => {
|
||||
let postedBody = null
|
||||
await mockDistributedNodes(page, {
|
||||
onDelete: async (route) => {
|
||||
postedBody = route.request().postDataJSON()
|
||||
},
|
||||
})
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
await page.locator('button[title="Delete backend from this node"]').click()
|
||||
|
||||
// ConfirmDialog uses role="alertdialog" and a danger confirm button.
|
||||
const dialog = page.getByRole('alertdialog')
|
||||
await expect(dialog).toBeVisible()
|
||||
const confirmBtn = dialog.locator('button.btn-danger')
|
||||
await expect(confirmBtn).toBeVisible()
|
||||
await confirmBtn.click()
|
||||
|
||||
// Wait until the POST landed.
|
||||
await expect.poll(() => postedBody, { timeout: 5_000 }).toEqual({ backend: BACKEND_NAME })
|
||||
})
|
||||
|
||||
test('clicking delete and cancelling does not POST', async ({ page }) => {
|
||||
let deleteCalls = 0
|
||||
await mockDistributedNodes(page, {
|
||||
onDelete: () => {
|
||||
deleteCalls += 1
|
||||
},
|
||||
})
|
||||
await expandNodeAndWaitForBackends(page)
|
||||
|
||||
await page.locator('button[title="Delete backend from this node"]').click()
|
||||
|
||||
const dialog = page.getByRole('alertdialog')
|
||||
await expect(dialog).toBeVisible()
|
||||
await dialog.getByRole('button', { name: /cancel/i }).click()
|
||||
await expect(dialog).toBeHidden()
|
||||
|
||||
// Give any errant request a moment to fire so a regression would be caught.
|
||||
await page.waitForTimeout(500)
|
||||
expect(deleteCalls).toBe(0)
|
||||
})
|
||||
})
|
||||
@@ -7,7 +7,7 @@
|
||||
<link rel="icon" type="image/svg+xml" href="/favicon.svg" />
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com" />
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
|
||||
<link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&family=JetBrains+Mono:wght@400;500;600&display=swap" rel="stylesheet" />
|
||||
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@300..700&family=Geist+Mono:wght@300..700&display=swap" rel="stylesheet" />
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
|
||||
7
core/http/react-ui/package-lock.json
generated
7
core/http/react-ui/package-lock.json
generated
@@ -3258,9 +3258,9 @@
|
||||
}
|
||||
},
|
||||
"node_modules/postcss": {
|
||||
"version": "8.5.8",
|
||||
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.8.tgz",
|
||||
"integrity": "sha512-OW/rX8O/jXnm82Ey1k44pObPtdblfiuWnrd8X7GJ7emImCOstunGbXUpp7HdBrFQX6rJzn3sPT397Wp5aCwCHg==",
|
||||
"version": "8.5.10",
|
||||
"resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.10.tgz",
|
||||
"integrity": "sha512-pMMHxBOZKFU6HgAZ4eyGnwXF/EvPGGqUr0MnZ5+99485wwW41kW91A4LOGxSHhgugZmSChL5AlElNdwlNgcnLQ==",
|
||||
"dev": true,
|
||||
"funding": [
|
||||
{
|
||||
@@ -3276,6 +3276,7 @@
|
||||
"url": "https://github.com/sponsors/ai"
|
||||
}
|
||||
],
|
||||
"license": "MIT",
|
||||
"dependencies": {
|
||||
"nanoid": "^3.3.11",
|
||||
"picocolors": "^1.1.1",
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,9 +1,11 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
import { Outlet, useLocation } from 'react-router-dom'
|
||||
import { useState, useEffect, useRef } from 'react'
|
||||
import { Outlet, useLocation, useNavigate } from 'react-router-dom'
|
||||
import Sidebar from './components/Sidebar'
|
||||
import OperationsBar from './components/OperationsBar'
|
||||
import { ToastContainer, useToast } from './components/Toast'
|
||||
import { systemApi } from './utils/api'
|
||||
import { useTheme } from './contexts/ThemeContext'
|
||||
import { useAuth } from './context/AuthContext'
|
||||
|
||||
const COLLAPSED_KEY = 'localai_sidebar_collapsed'
|
||||
|
||||
@@ -15,6 +17,10 @@ export default function App() {
|
||||
const { toasts, addToast, removeToast } = useToast()
|
||||
const [version, setVersion] = useState('')
|
||||
const location = useLocation()
|
||||
const navigate = useNavigate()
|
||||
const { theme, toggleTheme } = useTheme()
|
||||
const { authEnabled, user } = useAuth()
|
||||
const hamburgerRef = useRef(null)
|
||||
const isChatRoute = location.pathname.match(/\/chat(\/|$)/) || location.pathname.match(/\/agents\/[^/]+\/chat/)
|
||||
|
||||
useEffect(() => {
|
||||
@@ -34,26 +40,80 @@ export default function App() {
|
||||
window.scrollTo(0, 0)
|
||||
}, [location.pathname])
|
||||
|
||||
// Drawer polish: lock body scroll, close on Escape, return focus to the
|
||||
// hamburger when the drawer closes. Only engages when the drawer is open;
|
||||
// desktop and tablet rail mode are unaffected.
|
||||
useEffect(() => {
|
||||
if (!sidebarOpen) return
|
||||
const prevOverflow = document.body.style.overflow
|
||||
document.body.style.overflow = 'hidden'
|
||||
const onKey = (e) => { if (e.key === 'Escape') setSidebarOpen(false) }
|
||||
window.addEventListener('keydown', onKey)
|
||||
return () => {
|
||||
document.body.style.overflow = prevOverflow
|
||||
window.removeEventListener('keydown', onKey)
|
||||
// Restore focus to the trigger so keyboard users land back where
|
||||
// they invoked the drawer from.
|
||||
hamburgerRef.current?.focus()
|
||||
}
|
||||
}, [sidebarOpen])
|
||||
|
||||
const layoutClasses = [
|
||||
'app-layout',
|
||||
isChatRoute ? 'app-layout-chat' : '',
|
||||
sidebarCollapsed ? 'sidebar-is-collapsed' : '',
|
||||
].filter(Boolean).join(' ')
|
||||
|
||||
const showAvatar = authEnabled && user
|
||||
const accountLabel = user?.name || user?.email || 'Account'
|
||||
|
||||
return (
|
||||
<div className={layoutClasses}>
|
||||
<Sidebar isOpen={sidebarOpen} onClose={() => setSidebarOpen(false)} />
|
||||
<main className="main-content">
|
||||
<main className="main-content" {...(sidebarOpen ? { 'aria-hidden': 'true', inert: '' } : {})}>
|
||||
<OperationsBar />
|
||||
{/* Mobile header */}
|
||||
{/* Mobile header — primary actions reachable without opening the
|
||||
drawer. Hamburger is the only way to expand the nav on phones;
|
||||
theme toggle and account avatar are mirrored from the sidebar
|
||||
footer so they remain one tap away. */}
|
||||
<header className="mobile-header">
|
||||
<button
|
||||
ref={hamburgerRef}
|
||||
className="hamburger-btn"
|
||||
onClick={() => setSidebarOpen(true)}
|
||||
aria-label="Open menu"
|
||||
aria-expanded={sidebarOpen}
|
||||
aria-controls="app-sidebar"
|
||||
>
|
||||
<i className="fas fa-bars" />
|
||||
<i className="fas fa-bars" aria-hidden="true" />
|
||||
</button>
|
||||
<span className="mobile-title">LocalAI</span>
|
||||
<div className="mobile-header-actions">
|
||||
<button
|
||||
type="button"
|
||||
className="mobile-header-btn"
|
||||
onClick={toggleTheme}
|
||||
aria-label={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
|
||||
title={`Switch to ${theme === 'dark' ? 'light' : 'dark'} mode`}
|
||||
>
|
||||
<i className={`fas ${theme === 'dark' ? 'fa-sun' : 'fa-moon'}`} aria-hidden="true" />
|
||||
</button>
|
||||
{showAvatar && (
|
||||
<button
|
||||
type="button"
|
||||
className="mobile-header-btn mobile-header-avatar"
|
||||
onClick={() => navigate('/app/account')}
|
||||
aria-label={`Account: ${accountLabel}`}
|
||||
title={accountLabel}
|
||||
>
|
||||
{user.avatarUrl ? (
|
||||
<img src={user.avatarUrl} alt="" />
|
||||
) : (
|
||||
<i className="fas fa-user-circle" aria-hidden="true" />
|
||||
)}
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
</header>
|
||||
<div className="main-content-inner">
|
||||
<div className="page-transition" key={location.pathname}>
|
||||
|
||||
141
core/http/react-ui/src/components/ActionMenu.jsx
Normal file
141
core/http/react-ui/src/components/ActionMenu.jsx
Normal file
@@ -0,0 +1,141 @@
|
||||
import { useRef, useState, useEffect, useCallback } from 'react'
|
||||
import Popover from './Popover'
|
||||
|
||||
// ActionMenu renders a kebab (three-dot) button that opens a popover with a
|
||||
// list of row actions. Replaces the inline cluster of icon buttons that made
|
||||
// dense tables feel like a control panel — actions stay out of the way until
|
||||
// the user reaches for them, the way Linear/Vercel/Notion handle row menus.
|
||||
//
|
||||
// Items shape:
|
||||
// { key, icon?, label, onClick, danger?, disabled?, hidden?, shortcut? }
|
||||
// { divider: true } // visual separator
|
||||
// { type: 'badge', icon?, label } // non-interactive badge row
|
||||
//
|
||||
// Hidden items are filtered out so callers can write conditional menus
|
||||
// inline (`{ key: 'stop', visible: isRunning, ... }` style) without ternaries.
|
||||
//
|
||||
// Keyboard:
|
||||
// ArrowUp / ArrowDown — move highlight (skipping dividers + badges)
|
||||
// Enter / Space — activate
|
||||
// Escape — close, return focus to trigger
|
||||
export default function ActionMenu({ items, ariaLabel = 'Actions', triggerLabel, compact = false }) {
|
||||
const triggerRef = useRef(null)
|
||||
const [open, setOpen] = useState(false)
|
||||
const [activeIdx, setActiveIdx] = useState(-1)
|
||||
|
||||
const interactive = (Array.isArray(items) ? items : []).filter(it => it && !it.divider && it.type !== 'badge' && !it.hidden)
|
||||
const visible = (Array.isArray(items) ? items : []).filter(it => it && !it.hidden)
|
||||
|
||||
const close = useCallback(() => {
|
||||
setOpen(false)
|
||||
setActiveIdx(-1)
|
||||
}, [])
|
||||
|
||||
// Move highlight to the first interactive item when opening, so keyboard
|
||||
// users land somewhere meaningful instead of having to arrow into the menu.
|
||||
useEffect(() => {
|
||||
if (open && activeIdx === -1 && interactive.length > 0) {
|
||||
setActiveIdx(0)
|
||||
}
|
||||
}, [open, activeIdx, interactive.length])
|
||||
|
||||
const handleTriggerKeyDown = (e) => {
|
||||
if (e.key === 'ArrowDown' || e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
e.stopPropagation()
|
||||
setOpen(true)
|
||||
}
|
||||
}
|
||||
|
||||
const handleMenuKeyDown = (e) => {
|
||||
if (e.key === 'ArrowDown') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(i => Math.min(interactive.length - 1, (i < 0 ? -1 : i) + 1))
|
||||
} else if (e.key === 'ArrowUp') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(i => Math.max(0, (i < 0 ? interactive.length : i) - 1))
|
||||
} else if (e.key === 'Home') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(0)
|
||||
} else if (e.key === 'End') {
|
||||
e.preventDefault()
|
||||
setActiveIdx(interactive.length - 1)
|
||||
} else if (e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
const item = interactive[activeIdx]
|
||||
if (item && !item.disabled) {
|
||||
close()
|
||||
item.onClick?.()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (interactive.length === 0 && !visible.some(it => it.type === 'badge')) {
|
||||
return null
|
||||
}
|
||||
|
||||
return (
|
||||
<>
|
||||
<button
|
||||
ref={triggerRef}
|
||||
type="button"
|
||||
className={`action-menu__trigger${compact ? ' action-menu__trigger--compact' : ''}${open ? ' is-open' : ''}`}
|
||||
aria-haspopup="menu"
|
||||
aria-expanded={open}
|
||||
aria-label={triggerLabel || ariaLabel}
|
||||
onClick={(e) => { e.stopPropagation(); setOpen(v => !v) }}
|
||||
onKeyDown={handleTriggerKeyDown}
|
||||
>
|
||||
<i className="fas fa-ellipsis-vertical" />
|
||||
</button>
|
||||
<Popover anchor={triggerRef} open={open} onClose={close} ariaLabel={ariaLabel}>
|
||||
<div
|
||||
role="menu"
|
||||
aria-label={ariaLabel}
|
||||
className="action-menu"
|
||||
onKeyDown={handleMenuKeyDown}
|
||||
// Capture focus when the menu opens so arrow keys work without the
|
||||
// user clicking inside first.
|
||||
tabIndex={-1}
|
||||
ref={el => { if (el && open) el.focus() }}
|
||||
>
|
||||
{visible.map((item, i) => {
|
||||
if (item.divider) {
|
||||
return <div key={`d-${i}`} className="action-menu__divider" role="separator" />
|
||||
}
|
||||
if (item.type === 'badge') {
|
||||
return (
|
||||
<div key={item.key || `b-${i}`} className="action-menu__badge" role="presentation">
|
||||
{item.icon && <i className={`fas ${item.icon}`} aria-hidden="true" />}
|
||||
<span>{item.label}</span>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
const idx = interactive.indexOf(item)
|
||||
const active = idx === activeIdx
|
||||
return (
|
||||
<button
|
||||
key={item.key}
|
||||
type="button"
|
||||
role="menuitem"
|
||||
disabled={item.disabled}
|
||||
className={`action-menu__item${item.danger ? ' is-danger' : ''}${active ? ' is-active' : ''}`}
|
||||
onMouseEnter={() => setActiveIdx(idx)}
|
||||
onClick={(e) => {
|
||||
e.stopPropagation()
|
||||
if (item.disabled) return
|
||||
close()
|
||||
item.onClick?.()
|
||||
}}
|
||||
>
|
||||
{item.icon && <i className={`fas ${item.icon} action-menu__icon`} aria-hidden="true" />}
|
||||
<span className="action-menu__label">{item.label}</span>
|
||||
{item.shortcut && <span className="action-menu__shortcut">{item.shortcut}</span>}
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
</Popover>
|
||||
</>
|
||||
)
|
||||
}
|
||||
@@ -80,7 +80,7 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
|
||||
value={url}
|
||||
onChange={e => setUrl(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
@@ -88,7 +88,7 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Name (optional)"
|
||||
value={name}
|
||||
onChange={e => setName(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="password"
|
||||
@@ -96,13 +96,13 @@ export default function ClientMCPDropdown({
|
||||
placeholder="Auth token (optional)"
|
||||
value={authToken}
|
||||
onChange={e => setAuthToken(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
|
||||
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
|
||||
Use CORS proxy
|
||||
</label>
|
||||
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
|
||||
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
|
||||
<button type="button" className="btn btn-sm btn-primary" onClick={handleAdd} disabled={!url.trim()}>Add</button>
|
||||
</div>
|
||||
|
||||
@@ -135,7 +135,7 @@ function JsonEditor({ value, onChange }) {
|
||||
className="input"
|
||||
value={text}
|
||||
onChange={e => handleChange(e.target.value)}
|
||||
style={{ width: '100%', minHeight: 80, fontFamily: 'monospace', fontSize: '0.8125rem', resize: 'vertical' }}
|
||||
style={{ width: '100%', minHeight: 80, fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', resize: 'vertical' }}
|
||||
/>
|
||||
{parseError && <div style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 2 }}>{parseError}</div>}
|
||||
</div>
|
||||
|
||||
@@ -158,7 +158,7 @@ export default function FieldBrowser({ fields, activeFieldPaths, onAddField }) {
|
||||
{field.description}
|
||||
</div>
|
||||
)}
|
||||
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'monospace' }}>
|
||||
<div style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', marginTop: 1, fontFamily: 'var(--font-mono)' }}>
|
||||
{field.path}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
79
core/http/react-ui/src/components/GalleryLoader.jsx
Normal file
79
core/http/react-ui/src/components/GalleryLoader.jsx
Normal file
@@ -0,0 +1,79 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
|
||||
const LOADING_PHRASES = [
|
||||
{ text: 'Loading models...', icon: 'fa-brain' },
|
||||
{ text: 'Fetching gallery...', icon: 'fa-download' },
|
||||
{ text: 'Checking availability...', icon: 'fa-circle-check' },
|
||||
{ text: 'Almost ready...', icon: 'fa-hourglass-half' },
|
||||
{ text: 'Preparing gallery...', icon: 'fa-store' },
|
||||
]
|
||||
|
||||
// GalleryLoader is the animated skeleton used while the gallery list loads.
|
||||
// Used by Models, Backends, and (now) the Manage page so an empty fetch state
|
||||
// reads the same everywhere instead of one tab showing pulsing dots and the
|
||||
// other showing "Loading...".
|
||||
export default function GalleryLoader() {
|
||||
const [idx, setIdx] = useState(() => Math.floor(Math.random() * LOADING_PHRASES.length))
|
||||
const [fade, setFade] = useState(true)
|
||||
|
||||
useEffect(() => {
|
||||
const interval = setInterval(() => {
|
||||
setFade(false)
|
||||
setTimeout(() => {
|
||||
setIdx(prev => (prev + 1) % LOADING_PHRASES.length)
|
||||
setFade(true)
|
||||
}, 300)
|
||||
}, 2800)
|
||||
return () => clearInterval(interval)
|
||||
}, [])
|
||||
|
||||
const phrase = LOADING_PHRASES[idx]
|
||||
|
||||
return (
|
||||
<div style={{
|
||||
display: 'flex', flexDirection: 'column', alignItems: 'center',
|
||||
justifyContent: 'center', padding: 'var(--spacing-xl) var(--spacing-md)',
|
||||
minHeight: '280px', gap: 'var(--spacing-lg)',
|
||||
}}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)' }}>
|
||||
{[0, 1, 2, 3, 4].map(i => (
|
||||
<div key={i} style={{
|
||||
width: 10, height: 10, borderRadius: '50%',
|
||||
background: 'var(--color-primary)',
|
||||
animation: `galleryDot 1.4s ease-in-out ${i * 0.15}s infinite`,
|
||||
}} />
|
||||
))}
|
||||
</div>
|
||||
<div style={{
|
||||
display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)',
|
||||
opacity: fade ? 1 : 0,
|
||||
transition: 'opacity 300ms ease',
|
||||
color: 'var(--color-text-secondary)',
|
||||
fontSize: '0.9375rem',
|
||||
fontWeight: 500,
|
||||
}}>
|
||||
<i className={`fas ${phrase.icon}`} style={{ color: 'var(--color-accent)', fontSize: '1.125rem' }} />
|
||||
{phrase.text}
|
||||
</div>
|
||||
<div style={{ width: '100%', maxWidth: '700px', display: 'flex', flexDirection: 'column', gap: '12px' }}>
|
||||
{[0.9, 0.7, 0.5].map((opacity, i) => (
|
||||
<div key={i} style={{
|
||||
height: '48px', borderRadius: 'var(--radius-md)',
|
||||
background: 'var(--color-bg-tertiary)', opacity,
|
||||
animation: `galleryShimmer 1.8s ease-in-out ${i * 0.2}s infinite`,
|
||||
}} />
|
||||
))}
|
||||
</div>
|
||||
<style>{`
|
||||
@keyframes galleryDot {
|
||||
0%, 80%, 100% { transform: scale(0.4); opacity: 0.3; }
|
||||
40% { transform: scale(1); opacity: 1; }
|
||||
}
|
||||
@keyframes galleryShimmer {
|
||||
0%, 100% { opacity: var(--shimmer-base, 0.15); }
|
||||
50% { opacity: var(--shimmer-peak, 0.3); }
|
||||
}
|
||||
`}</style>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
47
core/http/react-ui/src/components/ManageSummary.jsx
Normal file
47
core/http/react-ui/src/components/ManageSummary.jsx
Normal file
@@ -0,0 +1,47 @@
|
||||
import StatCard from './StatCard'
|
||||
|
||||
// ManageSummary anchors the Manage page with the same StatCard pattern the
|
||||
// Nodes dashboard uses, so the page reads as a real overview rather than
|
||||
// "two tabs in a hat". Counts are derived in-memory by the parent — this
|
||||
// component is purely presentational. Cards are clickable and route the
|
||||
// user to the relevant tab + filter.
|
||||
export default function ManageSummary({
|
||||
modelsCount,
|
||||
backendsCount,
|
||||
runningCount,
|
||||
updatesCount,
|
||||
onCardClick,
|
||||
}) {
|
||||
const click = (tab, filter) => onCardClick && onCardClick(tab, filter)
|
||||
|
||||
return (
|
||||
<div className="stat-grid manage-summary">
|
||||
<StatCard
|
||||
icon="fas fa-brain"
|
||||
label="Models Installed"
|
||||
value={modelsCount}
|
||||
onClick={() => click('models', 'all')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-server"
|
||||
label="Backends Installed"
|
||||
value={backendsCount}
|
||||
onClick={() => click('backends', 'all')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-circle-play"
|
||||
label="Currently Running"
|
||||
value={runningCount}
|
||||
accentVar={runningCount > 0 ? '--color-success' : undefined}
|
||||
onClick={() => click('models', 'running')}
|
||||
/>
|
||||
<StatCard
|
||||
icon="fas fa-arrow-up"
|
||||
label="Updates Available"
|
||||
value={updatesCount}
|
||||
accentVar={updatesCount > 0 ? '--color-warning' : undefined}
|
||||
onClick={() => click('backends', updatesCount > 0 ? 'upgradable' : 'all')}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
30
core/http/react-ui/src/components/MetaBadgeRow.jsx
Normal file
30
core/http/react-ui/src/components/MetaBadgeRow.jsx
Normal file
@@ -0,0 +1,30 @@
|
||||
// MetaBadgeRow renders the System / User / Meta / Dev badge cluster the same
|
||||
// way everywhere — Manage tabs and (in future) Install gallery. The badges
|
||||
// already exist as classes; this component locks down the icons + labels so
|
||||
// the same backend type doesn't read "User" in one tab and "downloaded" in
|
||||
// another.
|
||||
export default function MetaBadgeRow({ isSystem, isMeta, isDevelopment }) {
|
||||
return (
|
||||
<div className="badge-row">
|
||||
{isSystem ? (
|
||||
<span className="badge badge-info" title="Bundled with the LocalAI runtime">
|
||||
<i className="fas fa-shield-alt" /> System
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge badge-success" title="Installed from the gallery or external source">
|
||||
<i className="fas fa-download" /> User
|
||||
</span>
|
||||
)}
|
||||
{isMeta && (
|
||||
<span className="badge badge-accent" title="Meta backend — selects a concrete variant per node">
|
||||
<i className="fas fa-layer-group" /> Meta
|
||||
</span>
|
||||
)}
|
||||
{isDevelopment && (
|
||||
<span className="badge badge-warning" title="Marked as development / pre-release by the gallery">
|
||||
<i className="fas fa-flask" /> Dev
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
668
core/http/react-ui/src/components/NodeInstallPicker.jsx
Normal file
668
core/http/react-ui/src/components/NodeInstallPicker.jsx
Normal file
@@ -0,0 +1,668 @@
|
||||
import { useState, useMemo, useEffect, useRef } from 'react'
|
||||
import Modal from './Modal'
|
||||
import SearchableSelect from './SearchableSelect'
|
||||
import { nodesApi } from '../utils/api'
|
||||
|
||||
// NodeInstallPicker is the single multi-node install surface used both from
|
||||
// the Backends gallery split-button and from the "Install on more nodes" `+`
|
||||
// affordance in the Nodes column. Submit fires N parallel per-node install
|
||||
// calls; rows transition inline so the user sees per-node success/failure
|
||||
// without leaving the modal.
|
||||
//
|
||||
// Props:
|
||||
// open — controls visibility
|
||||
// onClose — close handler (header X / Cancel / Esc / backdrop)
|
||||
// onComplete — fired after at least one node install succeeded;
|
||||
// gallery uses this to refetch and update the Nodes
|
||||
// column without a manual reload
|
||||
// backend — { name, isMeta, capabilities, metaBackendFor }
|
||||
// nodes — BackendNode[] from /api/nodes
|
||||
// installedNodeIds — Set/array of node IDs that already have this backend
|
||||
// initialSelection — optional pre-selected node IDs (e.g. "missing nodes"
|
||||
// when opened from the Nodes column `+` affordance)
|
||||
|
||||
const STATUS_LABELS = { healthy: 'Healthy', draining: 'Draining', unhealthy: 'Unhealthy', offline: 'Offline' }
|
||||
|
||||
function formatVRAM(bytes) {
|
||||
if (!bytes || bytes === 0) return null
|
||||
const gb = bytes / (1024 * 1024 * 1024)
|
||||
return gb >= 1 ? `${gb.toFixed(1)} GB` : `${(bytes / (1024 * 1024)).toFixed(0)} MB`
|
||||
}
|
||||
|
||||
function gpuVendorLabel(vendor) {
|
||||
const labels = { nvidia: 'NVIDIA', amd: 'AMD', intel: 'Intel', vulkan: 'Vulkan' }
|
||||
return labels[vendor] || null
|
||||
}
|
||||
|
||||
// hardwareTargetOf parses the capability key that points to a concrete
|
||||
// variant in the parent meta's CapabilitiesMap. e.g. cpu-llama-cpp comes
|
||||
// from {"cpu": "cpu-llama-cpp"} → "cpu". Falls back to "" when the parent
|
||||
// is unknown (the gallery list payload still gives us metaBackendFor).
|
||||
function hardwareTargetOf(backend, allBackends) {
|
||||
if (!backend || !backend.name || backend.isMeta) return ''
|
||||
const parentName = backend.metaBackendFor
|
||||
if (!parentName) return ''
|
||||
const parent = (allBackends || []).find(b => b.name === parentName || b.id === parentName)
|
||||
if (!parent || !parent.capabilities) return ''
|
||||
for (const [cap, concreteName] of Object.entries(parent.capabilities)) {
|
||||
if (concreteName === backend.name) return cap
|
||||
}
|
||||
return ''
|
||||
}
|
||||
|
||||
// humanTargetLabel turns a capability key into a user-facing phrase used in
|
||||
// the picker header note: "CPU build", "CUDA 12 build", etc. Keep it
|
||||
// concrete and product-recognisable, not the raw token from the gallery.
|
||||
function humanTargetLabel(target) {
|
||||
if (!target) return 'hardware-specific build'
|
||||
const t = target.toLowerCase()
|
||||
if (t.startsWith('cpu') || t === 'default') return 'CPU build'
|
||||
if (t.includes('cuda-13') || t.includes('cuda13')) return 'CUDA 13 build'
|
||||
if (t.includes('cuda-12') || t.includes('cuda12')) return 'CUDA 12 build'
|
||||
if (t.includes('cuda')) return 'NVIDIA CUDA build'
|
||||
if (t.includes('l4t')) return 'NVIDIA Jetson (L4T) build'
|
||||
if (t.includes('nvidia')) return 'NVIDIA build'
|
||||
if (t.includes('rocm') || t.includes('amd')) return 'AMD ROCm build'
|
||||
if (t.includes('metal')) return 'Apple Metal build'
|
||||
if (t.includes('sycl') || t.includes('intel')) return 'Intel SYCL build'
|
||||
if (t.includes('vulkan')) return 'Vulkan build'
|
||||
if (t.includes('darwin-x86')) return 'macOS x86 build'
|
||||
return 'hardware-specific build'
|
||||
}
|
||||
|
||||
// suitabilityFor returns the picker's per-row suitability state for the
|
||||
// requested backend. Already-installed wins over compatible/override so
|
||||
// the user sees a single signal per row.
|
||||
function suitabilityFor({ node, backend, hardwareTarget, alreadyInstalled }) {
|
||||
if (alreadyInstalled) return 'installed'
|
||||
// backend can be null on the first render before pickerBackend is set —
|
||||
// this function is invoked from useMemo, which runs regardless of the
|
||||
// outer open guard. Treat missing data as "compatible" so the placeholder
|
||||
// render doesn't blow up; the picker won't actually paint anything until
|
||||
// the early-return below the hooks fires.
|
||||
if (!backend || backend.isMeta || !hardwareTarget) return 'compatible'
|
||||
const vendor = (node.gpu_vendor || '').toLowerCase()
|
||||
const t = hardwareTarget.toLowerCase()
|
||||
if (t.startsWith('cpu') || t === 'default') {
|
||||
// CPU builds always run; they're never marked Override (running CPU on a
|
||||
// GPU node is the headline use case the user is choosing intentionally).
|
||||
return 'compatible'
|
||||
}
|
||||
if (t.includes('nvidia') || t.includes('cuda') || t.includes('l4t')) {
|
||||
return vendor === 'nvidia' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('amd') || t.includes('rocm') || t.includes('hip')) {
|
||||
return vendor === 'amd' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('intel') || t.includes('sycl')) {
|
||||
return vendor === 'intel' ? 'compatible' : 'override'
|
||||
}
|
||||
if (t.includes('metal') || t.includes('darwin')) {
|
||||
// No vendor reporting for Metal; trust the user.
|
||||
return 'compatible'
|
||||
}
|
||||
return 'compatible'
|
||||
}
|
||||
|
||||
export default function NodeInstallPicker({
|
||||
open, onClose, onComplete,
|
||||
backend,
|
||||
nodes = [],
|
||||
allBackends = [],
|
||||
installedNodeIds = [],
|
||||
initialSelection,
|
||||
addToast,
|
||||
}) {
|
||||
const [search, setSearch] = useState('')
|
||||
const [showHealthy, setShowHealthy] = useState(true)
|
||||
const [showDraining, setShowDraining] = useState(false)
|
||||
const [selected, setSelected] = useState(() => new Set())
|
||||
const [overrideVariant, setOverrideVariant] = useState('') // chosen concrete name
|
||||
const [overrideExpanded, setOverrideExpanded] = useState(false)
|
||||
const [submitting, setSubmitting] = useState(false)
|
||||
const [showMismatchConfirm, setShowMismatchConfirm] = useState(false)
|
||||
// Per-node submission state: { [nodeId]: { status: 'pending'|'installing'|'done'|'error', error? , version? } }
|
||||
const [perNode, setPerNode] = useState({})
|
||||
const headerInputRef = useRef(null)
|
||||
|
||||
// Backend-derived metadata used throughout the picker.
|
||||
const hardwareTarget = useMemo(() => hardwareTargetOf(backend, allBackends), [backend, allBackends])
|
||||
const targetLabel = humanTargetLabel(hardwareTarget)
|
||||
const concreteVariants = useMemo(() => {
|
||||
if (!backend?.isMeta || !backend.capabilities) return []
|
||||
return Object.entries(backend.capabilities).map(([cap, concrete]) => ({
|
||||
value: concrete,
|
||||
label: `${concrete} · ${cap}`,
|
||||
}))
|
||||
}, [backend])
|
||||
|
||||
// Pending nodes are surgically removed from the list — they can't accept
|
||||
// installs until approved. Surface the count instead of dead-disabled rows.
|
||||
const pendingCount = nodes.filter(n => n.status === 'pending').length
|
||||
const backendNodes = nodes.filter(n =>
|
||||
(!n.node_type || n.node_type === 'backend') && n.status !== 'pending'
|
||||
)
|
||||
|
||||
const installedSet = useMemo(() => {
|
||||
const s = new Set()
|
||||
if (Array.isArray(installedNodeIds)) installedNodeIds.forEach(id => s.add(id))
|
||||
else if (installedNodeIds && typeof installedNodeIds.has === 'function') {
|
||||
installedNodeIds.forEach(id => s.add(id))
|
||||
}
|
||||
return s
|
||||
}, [installedNodeIds])
|
||||
|
||||
const filteredNodes = useMemo(() => {
|
||||
let list = backendNodes
|
||||
if (!showHealthy) list = list.filter(n => n.status !== 'healthy')
|
||||
if (!showDraining) list = list.filter(n => n.status !== 'draining')
|
||||
if (search.trim()) {
|
||||
const q = search.toLowerCase()
|
||||
list = list.filter(n =>
|
||||
(n.name || '').toLowerCase().includes(q) ||
|
||||
Object.entries(n.labels || {}).some(([k, v]) => `${k}=${v}`.toLowerCase().includes(q))
|
||||
)
|
||||
}
|
||||
return list
|
||||
}, [backendNodes, showHealthy, showDraining, search])
|
||||
|
||||
// Pre-seed selection on open. Reset all transient state so reopening
|
||||
// doesn't surface ghost progress from the prior submit.
|
||||
useEffect(() => {
|
||||
if (!open) return
|
||||
const initial = new Set()
|
||||
if (Array.isArray(initialSelection)) initialSelection.forEach(id => initial.add(id))
|
||||
setSelected(initial)
|
||||
setSearch('')
|
||||
setOverrideVariant('')
|
||||
setOverrideExpanded(false)
|
||||
setPerNode({})
|
||||
setSubmitting(false)
|
||||
setShowMismatchConfirm(false)
|
||||
}, [open, initialSelection])
|
||||
|
||||
// Auto-expand the variant override disclosure when at least one selected
|
||||
// node lacks a working GPU. This is the headline use case the feature
|
||||
// exists for; surfacing it instead of hiding behind a click.
|
||||
useEffect(() => {
|
||||
if (!backend?.isMeta) return
|
||||
const someGPUMissing = Array.from(selected).some(id => {
|
||||
const n = backendNodes.find(x => x.id === id)
|
||||
return n && (!n.gpu_vendor || n.gpu_vendor === '' || n.gpu_vendor === 'unknown')
|
||||
})
|
||||
if (someGPUMissing && !overrideExpanded) setOverrideExpanded(true)
|
||||
}, [selected, backend, backendNodes]) // eslint-disable-line react-hooks/exhaustive-deps
|
||||
|
||||
// The effective backend that gets installed on each node. For
|
||||
// hardware-specific backends this is just backend.name. For meta backends
|
||||
// with no override, the worker picks per-node — we pass backend.name and
|
||||
// the worker resolves. With an override set, the picker installs that
|
||||
// exact concrete variant on every selected node.
|
||||
const effectiveBackendName = overrideVariant || backend?.name
|
||||
|
||||
const counts = useMemo(() => {
|
||||
let already = 0, overrides = 0
|
||||
selected.forEach(id => {
|
||||
const n = backendNodes.find(x => x.id === id)
|
||||
if (!n) return
|
||||
if (installedSet.has(id)) { already++; return }
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
const s = suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false })
|
||||
if (s === 'override') overrides++
|
||||
})
|
||||
return { already, overrides, selected: selected.size }
|
||||
}, [selected, backendNodes, installedSet, overrideVariant, backend, hardwareTarget, allBackends])
|
||||
|
||||
const toggle = (nodeId) => {
|
||||
setSelected(prev => {
|
||||
const next = new Set(prev)
|
||||
next.has(nodeId) ? next.delete(nodeId) : next.add(nodeId)
|
||||
return next
|
||||
})
|
||||
}
|
||||
|
||||
const selectAllHealthy = () => {
|
||||
setSelected(new Set(filteredNodes.filter(n => n.status === 'healthy').map(n => n.id)))
|
||||
}
|
||||
const selectCompatible = () => {
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend?.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
setSelected(new Set(
|
||||
filteredNodes
|
||||
.filter(n => suitabilityFor({ node: n, backend: eff, hardwareTarget: target, alreadyInstalled: false }) === 'compatible')
|
||||
.map(n => n.id)
|
||||
))
|
||||
}
|
||||
const clearSelection = () => setSelected(new Set())
|
||||
|
||||
const submit = async () => {
|
||||
if (selected.size === 0 || submitting) return
|
||||
if (counts.overrides > 0 && !showMismatchConfirm) {
|
||||
setShowMismatchConfirm(true)
|
||||
return
|
||||
}
|
||||
setShowMismatchConfirm(false)
|
||||
setSubmitting(true)
|
||||
const ids = Array.from(selected)
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
ids.forEach(id => { next[id] = { status: 'installing' } })
|
||||
return next
|
||||
})
|
||||
|
||||
const results = await Promise.allSettled(ids.map(id =>
|
||||
nodesApi.installBackend(id, effectiveBackendName)
|
||||
.then(r => ({ id, ok: true, message: r?.message }))
|
||||
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
|
||||
))
|
||||
|
||||
let successCount = 0, failCount = 0
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
for (const r of results) {
|
||||
if (r.status !== 'fulfilled') continue
|
||||
const v = r.value
|
||||
if (v.ok) {
|
||||
next[v.id] = { status: 'done' }
|
||||
successCount++
|
||||
} else {
|
||||
next[v.id] = { status: 'error', error: v.error }
|
||||
failCount++
|
||||
}
|
||||
}
|
||||
return next
|
||||
})
|
||||
setSubmitting(false)
|
||||
|
||||
if (successCount > 0 && onComplete) onComplete()
|
||||
|
||||
if (failCount === 0) {
|
||||
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
|
||||
setTimeout(() => onClose?.(), 800)
|
||||
} else if (successCount === 0) {
|
||||
addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
|
||||
} else {
|
||||
addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
|
||||
}
|
||||
}
|
||||
|
||||
const retryFailed = async () => {
|
||||
const failedIds = Object.entries(perNode)
|
||||
.filter(([, v]) => v.status === 'error')
|
||||
.map(([id]) => id)
|
||||
if (failedIds.length === 0) return
|
||||
setSelected(new Set(failedIds))
|
||||
// Replace state for failed rows so they show "installing" again, not stale errors.
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
failedIds.forEach(id => { next[id] = { status: 'installing' } })
|
||||
return next
|
||||
})
|
||||
setSubmitting(true)
|
||||
const results = await Promise.allSettled(failedIds.map(id =>
|
||||
nodesApi.installBackend(id, effectiveBackendName)
|
||||
.then(r => ({ id, ok: true, message: r?.message }))
|
||||
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
|
||||
))
|
||||
let successCount = 0, failCount = 0
|
||||
setPerNode(prev => {
|
||||
const next = { ...prev }
|
||||
for (const r of results) {
|
||||
if (r.status !== 'fulfilled') continue
|
||||
const v = r.value
|
||||
if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
|
||||
else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
|
||||
}
|
||||
return next
|
||||
})
|
||||
setSubmitting(false)
|
||||
if (successCount > 0 && onComplete) onComplete()
|
||||
if (failCount === 0) {
|
||||
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
|
||||
setTimeout(() => onClose?.(), 800)
|
||||
}
|
||||
}
|
||||
|
||||
const doneCount = Object.values(perNode).filter(v => v.status === 'done').length
|
||||
const errorCount = Object.values(perNode).filter(v => v.status === 'error').length
|
||||
const totalAttempted = Object.keys(perNode).length
|
||||
|
||||
if (!open || !backend) return null
|
||||
|
||||
const noNodes = backendNodes.length === 0
|
||||
|
||||
return (
|
||||
<Modal onClose={onClose} maxWidth="780px">
|
||||
<div style={{
|
||||
padding: 'var(--spacing-md) var(--spacing-lg)',
|
||||
borderBottom: '1px solid var(--color-border-subtle)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
justifyContent: 'space-between',
|
||||
gap: 'var(--spacing-sm)',
|
||||
}}>
|
||||
<h2 style={{ margin: 0, fontSize: '1rem', display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
|
||||
<i className="fas fa-cog" style={{ color: 'var(--color-primary)' }} />
|
||||
Install <span style={{ fontFamily: 'var(--font-mono)' }}>{backend.name}</span>
|
||||
{backend.isMeta ? (
|
||||
<span className="badge badge-info" style={{ fontSize: '0.6875rem' }}>Auto-resolving</span>
|
||||
) : (
|
||||
<span className="badge badge-warning" style={{ fontSize: '0.6875rem' }}>Hardware-specific</span>
|
||||
)}
|
||||
</h2>
|
||||
<button
|
||||
type="button"
|
||||
className="btn btn-ghost btn-sm"
|
||||
onClick={onClose}
|
||||
aria-label="Close"
|
||||
style={{ fontSize: '1.125rem', lineHeight: 1, padding: '4px 10px' }}
|
||||
>×</button>
|
||||
</div>
|
||||
|
||||
<div style={{ padding: 'var(--spacing-md) var(--spacing-lg)' }}>
|
||||
{!backend.isMeta && (
|
||||
<div className="card" style={{
|
||||
marginBottom: 'var(--spacing-md)',
|
||||
padding: 'var(--spacing-sm) var(--spacing-md)',
|
||||
background: 'var(--color-warning-light)',
|
||||
border: '1px solid var(--color-warning-border)',
|
||||
borderRadius: 'var(--radius-md)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
gap: 'var(--spacing-sm)',
|
||||
}}>
|
||||
<i className="fas fa-microchip" style={{ color: 'var(--color-warning)' }} />
|
||||
<span style={{ color: 'var(--color-warning)', fontSize: '0.8125rem' }}>
|
||||
{targetLabel}. Install only on nodes where you want this build to run.
|
||||
{hardwareTarget && ` Targets: ${humanTargetLabel(hardwareTarget).replace(' build', '')}.`}
|
||||
</span>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{noNodes ? (
|
||||
<div className="empty-state" style={{ padding: 'var(--spacing-xl) 0' }}>
|
||||
<div className="empty-state-icon"><i className="fas fa-server" /></div>
|
||||
<h3 className="empty-state-title">No backend nodes available</h3>
|
||||
<p className="empty-state-text">
|
||||
Approve pending workers or register new ones.
|
||||
{pendingCount > 0 && ` (${pendingCount} awaiting approval.)`}
|
||||
</p>
|
||||
<a className="btn btn-secondary btn-sm" href="/app/nodes">
|
||||
<i className="fas fa-network-wired" /> Manage nodes
|
||||
</a>
|
||||
</div>
|
||||
) : (
|
||||
<>
|
||||
{/* Filter row */}
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', alignItems: 'center', marginBottom: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
|
||||
<div className="search-bar" style={{ flex: 1, minWidth: 180 }}>
|
||||
<i className="fas fa-search search-icon" />
|
||||
<input
|
||||
ref={headerInputRef}
|
||||
className="input"
|
||||
placeholder="Filter nodes by name or label..."
|
||||
value={search}
|
||||
onChange={e => setSearch(e.target.value)}
|
||||
/>
|
||||
</div>
|
||||
<button className="btn btn-secondary btn-sm" onClick={selectAllHealthy} type="button">
|
||||
Select all healthy
|
||||
</button>
|
||||
<button className="btn btn-secondary btn-sm" onClick={selectCompatible} type="button">
|
||||
Select compatible nodes
|
||||
</button>
|
||||
{selected.size > 0 && (
|
||||
<button className="btn btn-ghost btn-sm" onClick={clearSelection} type="button">
|
||||
Clear
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Variant override (auto-resolving only) */}
|
||||
{backend.isMeta && concreteVariants.length > 0 && (
|
||||
<div style={{ marginBottom: 'var(--spacing-sm)' }}>
|
||||
<button
|
||||
type="button"
|
||||
className="btn btn-ghost btn-sm"
|
||||
onClick={() => setOverrideExpanded(v => !v)}
|
||||
aria-expanded={overrideExpanded}
|
||||
style={{ padding: '4px 8px' }}
|
||||
>
|
||||
<i className={`fas fa-chevron-${overrideExpanded ? 'down' : 'right'}`} style={{ marginRight: 4, fontSize: '0.625rem' }} />
|
||||
Override variant for selected nodes…
|
||||
</button>
|
||||
{overrideExpanded && (
|
||||
<div className="card" style={{ marginTop: 4, padding: 'var(--spacing-sm) var(--spacing-md)' }}>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 0, marginBottom: 'var(--spacing-xs)' }}>
|
||||
By default each node picks its own variant. Override to install one specific variant on every selected node — useful when GPU detection fails on a node and you want the CPU build there instead.
|
||||
</p>
|
||||
<SearchableSelect
|
||||
value={overrideVariant}
|
||||
onChange={setOverrideVariant}
|
||||
options={concreteVariants}
|
||||
placeholder="Per-node auto-resolve (default)"
|
||||
allOption={{ value: '', label: 'Per-node auto-resolve (default)' }}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Node table */}
|
||||
<div className="table-container" style={{ marginBottom: 'var(--spacing-sm)', maxHeight: '40vh', overflowY: 'auto' }}>
|
||||
<table className="table" style={{ margin: 0 }}>
|
||||
<thead>
|
||||
<tr>
|
||||
<th style={{ width: 28 }}>
|
||||
<input
|
||||
type="checkbox"
|
||||
aria-label="Select all visible"
|
||||
checked={filteredNodes.length > 0 && filteredNodes.every(n => selected.has(n.id))}
|
||||
onChange={(e) => {
|
||||
setSelected(prev => {
|
||||
const next = new Set(prev)
|
||||
if (e.target.checked) filteredNodes.forEach(n => next.add(n.id))
|
||||
else filteredNodes.forEach(n => next.delete(n.id))
|
||||
return next
|
||||
})
|
||||
}}
|
||||
/>
|
||||
</th>
|
||||
<th>Node</th>
|
||||
<th>Status</th>
|
||||
<th>Hardware</th>
|
||||
<th>Suitability</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{filteredNodes.map(node => {
|
||||
const installed = installedSet.has(node.id)
|
||||
const eff = overrideVariant
|
||||
? { name: overrideVariant, isMeta: false, metaBackendFor: backend.name }
|
||||
: backend
|
||||
const target = overrideVariant ? hardwareTargetOf(eff, allBackends) : hardwareTarget
|
||||
const suit = suitabilityFor({ node, backend: eff, hardwareTarget: target, alreadyInstalled: installed })
|
||||
const isSel = selected.has(node.id)
|
||||
const rowState = perNode[node.id]
|
||||
const vendor = gpuVendorLabel(node.gpu_vendor)
|
||||
const totalVRAM = formatVRAM(node.total_vram)
|
||||
const totalRAM = formatVRAM(node.total_ram)
|
||||
return (
|
||||
<tr key={node.id}>
|
||||
<td>
|
||||
<input
|
||||
type="checkbox"
|
||||
aria-label={`Select ${node.name}`}
|
||||
aria-disabled={rowState?.status === 'installing'}
|
||||
checked={isSel}
|
||||
onChange={() => toggle(node.id)}
|
||||
/>
|
||||
</td>
|
||||
<td>
|
||||
<div style={{ display: 'flex', flexDirection: 'column', gap: 2 }}>
|
||||
<span style={{ fontWeight: 500, fontSize: '0.875rem' }}>{node.name}</span>
|
||||
{node.labels && Object.keys(node.labels).length > 0 && (
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 3 }}>
|
||||
{Object.entries(node.labels).slice(0, 3).map(([k, v]) => (
|
||||
<span key={k} className="cell-mono" style={{
|
||||
padding: '1px 5px', borderRadius: 'var(--radius-sm)', fontSize: '0.6875rem',
|
||||
background: 'var(--color-bg-tertiary)', border: '1px solid var(--color-border-subtle)',
|
||||
}}>{k}={v}</span>
|
||||
))}
|
||||
{Object.keys(node.labels).length > 3 && (
|
||||
<span className="cell-muted" style={{ fontSize: '0.6875rem' }}>
|
||||
+{Object.keys(node.labels).length - 3}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<span style={{ fontSize: '0.8125rem' }}>
|
||||
{STATUS_LABELS[node.status] || node.status}
|
||||
</span>
|
||||
</td>
|
||||
<td style={{ fontSize: '0.8125rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)' }}>
|
||||
{totalVRAM ? (
|
||||
<>{vendor && <span style={{ marginRight: 4 }}>{vendor}</span>}{totalVRAM}</>
|
||||
) : totalRAM ? (
|
||||
<span>CPU · {totalRAM}</span>
|
||||
) : <span className="cell-muted">—</span>}
|
||||
</td>
|
||||
<td>
|
||||
{rowState?.status === 'installing' ? (
|
||||
<span className="badge badge-info">
|
||||
<i className="fas fa-spinner fa-spin" style={{ marginRight: 4 }} />Installing
|
||||
</span>
|
||||
) : rowState?.status === 'done' ? (
|
||||
<span className="badge badge-success">
|
||||
<i className="fas fa-check" style={{ marginRight: 4 }} />Installed
|
||||
</span>
|
||||
) : rowState?.status === 'error' ? (
|
||||
<button
|
||||
type="button"
|
||||
className="badge badge-error"
|
||||
title={rowState.error}
|
||||
aria-describedby={`err-${node.id}`}
|
||||
style={{ border: 'none', cursor: 'help' }}
|
||||
>
|
||||
<i className="fas fa-exclamation-triangle" style={{ marginRight: 4 }} />Failed
|
||||
<span id={`err-${node.id}`} style={{ position: 'absolute', left: -9999 }}>{rowState.error}</span>
|
||||
</button>
|
||||
) : suit === 'installed' ? (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
Installed
|
||||
</span>
|
||||
) : suit === 'override' ? (
|
||||
<span className="badge badge-warning">
|
||||
<i className="fas fa-exclamation-circle" style={{ marginRight: 4 }} />Override
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge badge-success" style={{ background: 'var(--color-success-light)', color: 'var(--color-success)' }}>
|
||||
Compatible
|
||||
</span>
|
||||
)}
|
||||
</td>
|
||||
</tr>
|
||||
)
|
||||
})}
|
||||
{filteredNodes.length === 0 && (
|
||||
<tr>
|
||||
<td colSpan={5} style={{ textAlign: 'center', padding: 'var(--spacing-md)', color: 'var(--color-text-muted)' }}>
|
||||
No nodes match the current filters.
|
||||
</td>
|
||||
</tr>
|
||||
)}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{pendingCount > 0 && (
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 0, marginBottom: 'var(--spacing-sm)' }}>
|
||||
+{pendingCount} awaiting approval — <a href="/app/nodes" style={{ color: 'var(--color-primary)' }}>approve from Nodes</a>.
|
||||
</p>
|
||||
)}
|
||||
|
||||
{/* Mismatch confirm */}
|
||||
{showMismatchConfirm && (
|
||||
<div className="card" style={{
|
||||
marginBottom: 'var(--spacing-sm)',
|
||||
padding: 'var(--spacing-md)',
|
||||
background: 'var(--color-warning-light)',
|
||||
border: '1px solid var(--color-warning-border)',
|
||||
borderRadius: 'var(--radius-md)',
|
||||
}}>
|
||||
<p style={{ marginTop: 0, marginBottom: 'var(--spacing-sm)', color: 'var(--color-warning)', fontSize: '0.875rem' }}>
|
||||
Installing {targetLabel.toLowerCase()} on {counts.overrides} node{counts.overrides === 1 ? '' : 's'} that don't match. Those nodes will run inference on the chosen build, not their native GPU. Continue?
|
||||
</p>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', justifyContent: 'flex-end' }}>
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={() => setShowMismatchConfirm(false)}>
|
||||
Cancel
|
||||
</button>
|
||||
<button className="btn btn-primary btn-sm" type="button" onClick={submit}
|
||||
style={{ background: 'var(--color-warning)', borderColor: 'var(--color-warning)' }}>
|
||||
Install on {targetLabel.replace(' build', '')}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{!noNodes && (
|
||||
<div style={{
|
||||
padding: 'var(--spacing-md) var(--spacing-lg)',
|
||||
borderTop: '1px solid var(--color-border-subtle)',
|
||||
display: 'flex',
|
||||
alignItems: 'center',
|
||||
gap: 'var(--spacing-sm)',
|
||||
flexWrap: 'wrap',
|
||||
}}>
|
||||
<div style={{ flex: 1, fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>
|
||||
{totalAttempted > 0 ? (
|
||||
<>
|
||||
{doneCount} of {totalAttempted} done
|
||||
{errorCount > 0 && (
|
||||
<> · <span className="badge badge-error" style={{ fontSize: '0.6875rem' }}>{errorCount} failed</span></>
|
||||
)}
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
{counts.selected} {counts.selected === 1 ? 'node' : 'nodes'} selected
|
||||
{counts.already > 0 && <> · {counts.already} already installed</>}
|
||||
{counts.overrides > 0 && <> · {counts.overrides} override{counts.overrides === 1 ? '' : 's'}</>}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
{errorCount > 0 && !submitting && (
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={retryFailed}>
|
||||
<i className="fas fa-redo" /> Retry failed nodes
|
||||
</button>
|
||||
)}
|
||||
<button className="btn btn-secondary btn-sm" type="button" onClick={onClose} disabled={submitting}>
|
||||
{totalAttempted > 0 && doneCount > 0 ? 'Close' : 'Cancel'}
|
||||
</button>
|
||||
<button
|
||||
className="btn btn-primary btn-sm"
|
||||
type="button"
|
||||
onClick={submit}
|
||||
disabled={submitting || counts.selected === 0 || showMismatchConfirm}
|
||||
>
|
||||
{submitting ? (
|
||||
<><i className="fas fa-spinner fa-spin" /> Installing…</>
|
||||
) : (
|
||||
<>Install on {counts.selected} {counts.selected === 1 ? 'node' : 'nodes'}</>
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
</Modal>
|
||||
)
|
||||
}
|
||||
29
core/http/react-ui/src/components/ResourceActions.jsx
Normal file
29
core/http/react-ui/src/components/ResourceActions.jsx
Normal file
@@ -0,0 +1,29 @@
|
||||
// ResourceActions groups row-level buttons into a lifecycle cluster (start,
|
||||
// stop, pin, reinstall, upgrade) and a destructive cluster (delete) with a
|
||||
// thin divider between them, so a destructive intent visually separates from
|
||||
// a routine one. Replaces the old 4px-gap row of buttons in the Manage page
|
||||
// where Stop / Pin / Delete sat shoulder-to-shoulder with no visual cue
|
||||
// telling apart "click to fiddle" from "click to throw away".
|
||||
//
|
||||
// `lifecycle` and `destructive` accept any ReactNode — typically one or more
|
||||
// <button>s. The wrapping div stops click propagation so action clicks don't
|
||||
// also expand the row.
|
||||
export default function ResourceActions({ lifecycle, destructive }) {
|
||||
const hasLifecycle = !!lifecycle
|
||||
const hasDestructive = !!destructive
|
||||
if (!hasLifecycle && !hasDestructive) return null
|
||||
|
||||
return (
|
||||
<div className="resource-actions" onClick={e => e.stopPropagation()}>
|
||||
{hasLifecycle && (
|
||||
<div className="resource-actions__group">{lifecycle}</div>
|
||||
)}
|
||||
{hasLifecycle && hasDestructive && (
|
||||
<span className="resource-actions__divider" aria-hidden="true" />
|
||||
)}
|
||||
{hasDestructive && (
|
||||
<div className="resource-actions__group">{destructive}</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -51,7 +51,7 @@ export default function ResourceMonitor() {
|
||||
<div className="resource-bar-container" style={{ flex: 1 }}>
|
||||
<div className="resource-bar" style={{ width: `${pct}%`, background: color }} />
|
||||
</div>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color, minWidth: '3em', textAlign: 'right' }}>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color, minWidth: '3em', textAlign: 'right' }}>
|
||||
{pct.toFixed(0)}%
|
||||
</span>
|
||||
</div>
|
||||
@@ -76,7 +76,7 @@ export default function ResourceMonitor() {
|
||||
<div className="resource-bar-container" style={{ flex: 1 }}>
|
||||
<div className="resource-bar" style={{ width: `${ram.usage_percent || 0}%`, background: percentColor(ram.usage_percent || 0) }} />
|
||||
</div>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: "'JetBrains Mono', monospace", color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
|
||||
<span style={{ fontSize: '0.8125rem', fontWeight: 600, fontFamily: 'var(--font-mono)', color: percentColor(ram.usage_percent || 0), minWidth: '3em', textAlign: 'right' }}>
|
||||
{(ram.usage_percent || 0).toFixed(0)}%
|
||||
</span>
|
||||
</div>
|
||||
@@ -91,7 +91,7 @@ export default function ResourceMonitor() {
|
||||
{isGpu && aggregate.gpu_count > 1 && (
|
||||
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
|
||||
<span>Total VRAM</span>
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>
|
||||
<span style={{ fontFamily: 'var(--font-mono)' }}>
|
||||
{formatBytes(aggregate.used_memory)} / {formatBytes(aggregate.total_memory)} ({aggregate.usage_percent?.toFixed(1)}%)
|
||||
</span>
|
||||
</div>
|
||||
@@ -101,7 +101,7 @@ export default function ResourceMonitor() {
|
||||
{resources.storage_size != null && (
|
||||
<div style={{ fontSize: '0.75rem', color: 'var(--color-text-secondary)', marginTop: 'var(--spacing-sm)', display: 'flex', justifyContent: 'space-between' }}>
|
||||
<span>Models storage</span>
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-primary)' }}>
|
||||
<span style={{ fontFamily: 'var(--font-mono)', color: 'var(--color-text-primary)' }}>
|
||||
{formatBytes(resources.storage_size)}
|
||||
</span>
|
||||
</div>
|
||||
|
||||
81
core/http/react-ui/src/components/ResourceRow.jsx
Normal file
81
core/http/react-ui/src/components/ResourceRow.jsx
Normal file
@@ -0,0 +1,81 @@
|
||||
import { Fragment } from 'react'
|
||||
|
||||
// ResourceRow renders the visible row + its conditional detail row as a pair
|
||||
// of <tr>s, so the existing .table styling keeps applying and the Manage page
|
||||
// can re-use the gallery's expand-to-detail interaction without inventing a
|
||||
// new table system. The consumer owns the cells (which pass through as
|
||||
// children) — this component only manages the click-to-expand handler, the
|
||||
// dimmed state for disabled rows, and the colSpan'd detail row beneath.
|
||||
//
|
||||
// `onToggleExpand` fires on row click only. Buttons / toggles inside cells
|
||||
// must call e.stopPropagation() (or be wrapped in an .actions-stop wrapper)
|
||||
// to avoid double-triggering the expand.
|
||||
export default function ResourceRow({
|
||||
expanded,
|
||||
onToggleExpand,
|
||||
detail,
|
||||
colSpan,
|
||||
dimmed,
|
||||
className = '',
|
||||
children,
|
||||
}) {
|
||||
return (
|
||||
<Fragment>
|
||||
<tr
|
||||
className={`resource-row${dimmed ? ' is-dimmed' : ''}${expanded ? ' is-expanded' : ''} ${className}`.trim()}
|
||||
onClick={onToggleExpand}
|
||||
style={{ cursor: onToggleExpand ? 'pointer' : 'default' }}
|
||||
>
|
||||
{children}
|
||||
</tr>
|
||||
{expanded && detail && (
|
||||
<tr className="resource-row__detail-row">
|
||||
<td colSpan={colSpan} className="resource-row__detail-cell">
|
||||
{detail}
|
||||
</td>
|
||||
</tr>
|
||||
)}
|
||||
</Fragment>
|
||||
)
|
||||
}
|
||||
|
||||
// ChevronCell is the small rotating chevron used as the leftmost cell of an
|
||||
// expandable row. Mirrors the Nodes/Models/Backends gallery affordance so
|
||||
// users see the same "click to expand" cue everywhere.
|
||||
export function ChevronCell({ expanded }) {
|
||||
return (
|
||||
<td className="resource-row__chevron-cell">
|
||||
<span className={`row-chevron${expanded ? ' is-expanded' : ''}`} aria-hidden="true">
|
||||
<i className="fas fa-chevron-right" />
|
||||
</span>
|
||||
</td>
|
||||
)
|
||||
}
|
||||
|
||||
// IconCell renders the 48px brand icon shell — the same one the Install
|
||||
// gallery uses. `icon` is the image URL (from gallery metadata); when absent
|
||||
// or broken we fall back to a FontAwesome glyph so custom-imported items
|
||||
// still get a placeholder instead of an empty square.
|
||||
export function IconCell({ icon, fallback = 'fa-cube', alt = '' }) {
|
||||
return (
|
||||
<td className="resource-row__icon-cell">
|
||||
<div className="resource-row__icon">
|
||||
{icon ? (
|
||||
<img src={icon} alt={alt} loading="lazy" />
|
||||
) : (
|
||||
<i className={`fas ${fallback}`} />
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
)
|
||||
}
|
||||
|
||||
// StopPropagationCell wraps cell contents that contain interactive controls
|
||||
// (Toggle, action buttons) so a click on them doesn't also expand the row.
|
||||
export function StopPropagationCell({ children, ...props }) {
|
||||
return (
|
||||
<td {...props} onClick={e => e.stopPropagation()}>
|
||||
{children}
|
||||
</td>
|
||||
)
|
||||
}
|
||||
@@ -116,7 +116,7 @@ export default function SearchableSelect({
|
||||
aria-expanded={open}
|
||||
onClick={() => { if (!disabled) { setOpen(!open); setQuery(''); setFocusIndex(-1) } }}
|
||||
style={{
|
||||
width: '100%', padding: '4px 8px', fontSize: '0.8125rem',
|
||||
width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem',
|
||||
cursor: disabled ? 'not-allowed' : 'pointer',
|
||||
display: 'flex', alignItems: 'center', gap: '6px',
|
||||
background: 'var(--color-bg-primary)', border: '1px solid var(--color-border)',
|
||||
@@ -145,7 +145,7 @@ export default function SearchableSelect({
|
||||
value={query}
|
||||
onChange={(e) => { setQuery(e.target.value); setFocusIndex(-1) }}
|
||||
onKeyDown={handleKeyDown}
|
||||
style={{ width: '100%', padding: '4px 8px', fontSize: '0.8125rem' }}
|
||||
style={{ width: '100%', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
</div>
|
||||
<div ref={listRef} role="listbox" style={{ overflowY: 'auto', maxHeight: 'min(200px, 50vh)' }}>
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useState, useEffect } from 'react'
|
||||
import { useState, useEffect, useRef } from 'react'
|
||||
import { NavLink, useNavigate, useLocation } from 'react-router-dom'
|
||||
import ThemeToggle from './ThemeToggle'
|
||||
import { useAuth } from '../context/AuthContext'
|
||||
@@ -107,11 +107,22 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
const { isAdmin, authEnabled, user, logout, hasFeature } = useAuth()
|
||||
const navigate = useNavigate()
|
||||
const location = useLocation()
|
||||
const closeBtnRef = useRef(null)
|
||||
|
||||
useEffect(() => {
|
||||
fetch(apiUrl('/api/features')).then(r => r.json()).then(setFeatures).catch(() => {})
|
||||
}, [])
|
||||
|
||||
// Move focus into the drawer when opened on mobile/tablet so keyboard
|
||||
// and screen-reader users land inside the dialog. Targeting the close
|
||||
// button avoids hijacking the visual focus to a nav item the user may
|
||||
// not have meant to activate.
|
||||
useEffect(() => {
|
||||
if (!isOpen) return
|
||||
const id = window.requestAnimationFrame(() => closeBtnRef.current?.focus())
|
||||
return () => window.cancelAnimationFrame(id)
|
||||
}, [isOpen])
|
||||
|
||||
// Auto-expand section containing the active route
|
||||
useEffect(() => {
|
||||
for (const section of sections) {
|
||||
@@ -168,7 +179,11 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
<>
|
||||
{isOpen && <div className="sidebar-overlay" onClick={onClose} />}
|
||||
|
||||
<aside className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}>
|
||||
<aside
|
||||
id="app-sidebar"
|
||||
className={`sidebar ${isOpen ? 'open' : ''} ${collapsed ? 'collapsed' : ''}`}
|
||||
aria-label="Primary navigation"
|
||||
>
|
||||
{/* Logo */}
|
||||
<div className="sidebar-header">
|
||||
<a href="./" className="sidebar-logo-link">
|
||||
@@ -177,8 +192,13 @@ export default function Sidebar({ isOpen, onClose }) {
|
||||
<a href="./" className="sidebar-logo-icon" title="LocalAI">
|
||||
<img src={apiUrl('/static/logo.png')} alt="LocalAI" className="sidebar-logo-icon-img" />
|
||||
</a>
|
||||
<button className="sidebar-close-btn" onClick={onClose} aria-label="Close menu">
|
||||
<i className="fas fa-times" />
|
||||
<button
|
||||
ref={closeBtnRef}
|
||||
className="sidebar-close-btn"
|
||||
onClick={onClose}
|
||||
aria-label="Close menu"
|
||||
>
|
||||
<i className="fas fa-times" aria-hidden="true" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
|
||||
39
core/http/react-ui/src/components/StatCard.jsx
Normal file
39
core/http/react-ui/src/components/StatCard.jsx
Normal file
@@ -0,0 +1,39 @@
|
||||
// StatCard renders a single cluster/dashboard metric card. The left accent
|
||||
// bar + icon chip color is driven by `accentVar` (a CSS custom property name,
|
||||
// e.g. "--color-success") so the card reads as semantic without the caller
|
||||
// having to reach into colors directly. `onClick` upgrades the card to a
|
||||
// keyboard-focusable button — used by the Manage page so cards double as
|
||||
// shortcuts to the relevant tab + filter.
|
||||
export default function StatCard({ icon, label, value, color, accentVar, onClick }) {
|
||||
const accent = color || (accentVar ? `var(${accentVar})` : 'var(--color-text-primary)')
|
||||
const interactive = typeof onClick === 'function'
|
||||
|
||||
const handleKeyDown = interactive
|
||||
? (e) => {
|
||||
if (e.key === 'Enter' || e.key === ' ') {
|
||||
e.preventDefault()
|
||||
onClick(e)
|
||||
}
|
||||
}
|
||||
: undefined
|
||||
|
||||
return (
|
||||
<div
|
||||
className="stat-card"
|
||||
data-clickable={interactive ? 'true' : undefined}
|
||||
role={interactive ? 'button' : undefined}
|
||||
tabIndex={interactive ? 0 : undefined}
|
||||
onClick={interactive ? onClick : undefined}
|
||||
onKeyDown={handleKeyDown}
|
||||
style={accentVar ? { ['--stat-accent']: `var(${accentVar})` } : undefined}
|
||||
>
|
||||
<div className="stat-card__body">
|
||||
<div className="stat-card__label">{label}</div>
|
||||
<div className="stat-card__value" style={{ color: accent }}>{value}</div>
|
||||
</div>
|
||||
<div className="stat-card__icon" style={accentVar ? { color: accent } : undefined}>
|
||||
<i className={icon} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -24,7 +24,7 @@ export default function TemplateSelector({ onSelect }) {
|
||||
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)', lineHeight: 1.5, margin: 0 }}>
|
||||
{t.description}
|
||||
</p>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: '4px', marginTop: 'var(--spacing-xs)' }}>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)', marginTop: 'var(--spacing-xs)' }}>
|
||||
{Object.keys(t.fields).filter(k => k !== 'name').map(k => (
|
||||
<span key={k} className="badge" style={{
|
||||
fontSize: '0.6875rem', background: 'var(--color-bg-tertiary)',
|
||||
|
||||
@@ -187,7 +187,7 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Server URL (e.g. https://mcp.example.com/sse)"
|
||||
value={url}
|
||||
onChange={e => setUrl(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="text"
|
||||
@@ -195,7 +195,7 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Name (optional)"
|
||||
value={name}
|
||||
onChange={e => setName(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<input
|
||||
type="password"
|
||||
@@ -203,13 +203,13 @@ export default function UnifiedMCPDropdown({
|
||||
placeholder="Auth token (optional)"
|
||||
value={authToken}
|
||||
onChange={e => setAuthToken(e.target.value)}
|
||||
style={{ width: '100%', marginBottom: '4px' }}
|
||||
style={{ width: '100%', marginBottom: 'var(--spacing-xs)' }}
|
||||
/>
|
||||
<label style={{ display: 'flex', alignItems: 'center', gap: '6px', fontSize: '0.8rem', marginBottom: '6px' }}>
|
||||
<input type="checkbox" checked={useProxy} onChange={e => setUseProxy(e.target.checked)} />
|
||||
Use CORS proxy
|
||||
</label>
|
||||
<div style={{ display: 'flex', gap: '4px', justifyContent: 'flex-end' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
|
||||
<button type="button" className="btn btn-sm btn-secondary" onClick={() => setAddDialog(false)}>Cancel</button>
|
||||
<button type="button" className="btn btn-sm btn-primary" onClick={handleAddClient} disabled={!url.trim()}>Add</button>
|
||||
</div>
|
||||
|
||||
40
core/http/react-ui/src/hooks/useDistributedMode.js
vendored
Normal file
40
core/http/react-ui/src/hooks/useDistributedMode.js
vendored
Normal file
@@ -0,0 +1,40 @@
|
||||
import { useState, useEffect, useCallback } from 'react'
|
||||
import { nodesApi } from '../utils/api'
|
||||
|
||||
// useDistributedMode probes /api/nodes to decide whether the running LocalAI
|
||||
// is in distributed mode. The endpoint returns 503 when distributed mode is
|
||||
// disabled — we treat any failure as standalone, mirroring the detection
|
||||
// pattern in pages/Nodes.jsx so UI behaviour matches the Nodes page.
|
||||
//
|
||||
// Returns:
|
||||
// enabled — true when the cluster API answered OK at least once
|
||||
// nodes — the most recent /api/nodes response (array; possibly empty)
|
||||
// loading — true until the first probe completes
|
||||
// refetch — manual trigger; the picker calls this after install/delete
|
||||
//
|
||||
// Components that need a live nodes list (e.g. install picker) re-call
|
||||
// refetch after operations complete. The hook does not poll on its own —
|
||||
// the Nodes page handles its own 5s polling and the Backends gallery only
|
||||
// needs a one-shot read on mount.
|
||||
export function useDistributedMode() {
|
||||
const [enabled, setEnabled] = useState(false)
|
||||
const [nodes, setNodes] = useState([])
|
||||
const [loading, setLoading] = useState(true)
|
||||
|
||||
const probe = useCallback(async () => {
|
||||
try {
|
||||
const data = await nodesApi.list()
|
||||
setNodes(Array.isArray(data) ? data : [])
|
||||
setEnabled(true)
|
||||
} catch {
|
||||
setEnabled(false)
|
||||
setNodes([])
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [])
|
||||
|
||||
useEffect(() => { probe() }, [probe])
|
||||
|
||||
return { enabled, nodes, loading, refetch: probe }
|
||||
}
|
||||
53
core/http/react-ui/src/hooks/useGalleryEnrichment.js
vendored
Normal file
53
core/http/react-ui/src/hooks/useGalleryEnrichment.js
vendored
Normal file
@@ -0,0 +1,53 @@
|
||||
import { useState, useEffect, useCallback } from 'react'
|
||||
import { modelsApi, backendsApi } from '../utils/api'
|
||||
|
||||
// useGalleryEnrichment fetches the full model + backend gallery once and
|
||||
// returns lookup helpers used by the Manage page. The Manage list APIs only
|
||||
// know name/version/alias — descriptions, icons, licenses, tags, and links
|
||||
// live on the gallery side. Cross-referencing here lets us light up the
|
||||
// installed lists with the same metadata the Install pages show, instead of
|
||||
// rendering them as bare names.
|
||||
//
|
||||
// Items not present in the gallery (custom imports, external OCI installs)
|
||||
// resolve to `null` — callers fall back to a neutral icon + "no description".
|
||||
export function useGalleryEnrichment() {
|
||||
const [modelMap, setModelMap] = useState(() => new Map())
|
||||
const [backendMap, setBackendMap] = useState(() => new Map())
|
||||
const [loaded, setLoaded] = useState(false)
|
||||
|
||||
useEffect(() => {
|
||||
let cancelled = false
|
||||
Promise.allSettled([
|
||||
modelsApi.list({ items: 9999, page: 1 }),
|
||||
backendsApi.list({ items: 9999, page: 1 }),
|
||||
]).then(([m, b]) => {
|
||||
if (cancelled) return
|
||||
const mm = new Map()
|
||||
if (m.status === 'fulfilled') {
|
||||
const list = m.value?.models || []
|
||||
for (const x of list) {
|
||||
const key = x.name || x.id
|
||||
if (key) mm.set(key, x)
|
||||
}
|
||||
}
|
||||
const bm = new Map()
|
||||
if (b.status === 'fulfilled') {
|
||||
const raw = b.value
|
||||
const list = Array.isArray(raw?.backends) ? raw.backends : Array.isArray(raw) ? raw : []
|
||||
for (const x of list) {
|
||||
const key = x.name || x.id
|
||||
if (key) bm.set(key, x)
|
||||
}
|
||||
}
|
||||
setModelMap(mm)
|
||||
setBackendMap(bm)
|
||||
setLoaded(true)
|
||||
})
|
||||
return () => { cancelled = true }
|
||||
}, [])
|
||||
|
||||
const enrichModel = useCallback((name) => (name ? modelMap.get(name) || null : null), [modelMap])
|
||||
const enrichBackend = useCallback((name) => (name ? backendMap.get(name) || null : null), [backendMap])
|
||||
|
||||
return { enrichModel, enrichBackend, loaded }
|
||||
}
|
||||
12
core/http/react-ui/src/hooks/useModels.js
vendored
12
core/http/react-ui/src/hooks/useModels.js
vendored
@@ -6,9 +6,9 @@ export function useModels(capability) {
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState(null)
|
||||
|
||||
const fetchModels = useCallback(async () => {
|
||||
const fetchModels = useCallback(async ({ silent = false } = {}) => {
|
||||
try {
|
||||
setLoading(true)
|
||||
if (!silent) setLoading(true)
|
||||
const data = await modelsApi.listCapabilities()
|
||||
let items = data?.data || []
|
||||
if (capability) {
|
||||
@@ -30,15 +30,19 @@ export function useModels(capability) {
|
||||
setError(err.message)
|
||||
}
|
||||
} finally {
|
||||
setLoading(false)
|
||||
if (!silent) setLoading(false)
|
||||
}
|
||||
}, [capability])
|
||||
|
||||
// Subsequent refetches stay silent so consumers don't blank their tables
|
||||
// (e.g. the Manage page auto-refreshes every 10s in distributed mode).
|
||||
const refetch = useCallback(() => fetchModels({ silent: true }), [fetchModels])
|
||||
|
||||
useEffect(() => {
|
||||
fetchModels()
|
||||
}, [fetchModels])
|
||||
|
||||
return { models, loading, error, refetch: fetchModels }
|
||||
return { models, loading, error, refetch }
|
||||
}
|
||||
|
||||
export function useGalleryModels(params = {}) {
|
||||
|
||||
@@ -12,14 +12,17 @@ html {
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Space Grotesk', -apple-system, BlinkMacSystemFont, sans-serif;
|
||||
font-family: var(--font-sans);
|
||||
font-size: var(--text-base);
|
||||
font-weight: var(--font-weight-regular);
|
||||
line-height: var(--leading-normal);
|
||||
font-feature-settings: "ss01", "ss03", "cv11";
|
||||
letter-spacing: -0.005em;
|
||||
min-height: 100%;
|
||||
background-color: var(--color-bg-primary);
|
||||
color: var(--color-text-primary);
|
||||
transition: background-color 200ms ease, color 200ms ease;
|
||||
transition: background-color var(--duration-normal) var(--ease-default),
|
||||
color var(--duration-normal) var(--ease-default);
|
||||
}
|
||||
|
||||
#root {
|
||||
@@ -27,36 +30,73 @@ body {
|
||||
min-height: 100dvh;
|
||||
}
|
||||
|
||||
/* Scrollbar */
|
||||
::-webkit-scrollbar { width: 6px; height: 6px; }
|
||||
::-webkit-scrollbar-track { background: var(--color-bg-primary); }
|
||||
::-webkit-scrollbar-thumb { background: var(--color-bg-secondary); border-radius: 3px; }
|
||||
::-webkit-scrollbar-thumb:hover { background: var(--color-primary); }
|
||||
* { scrollbar-width: thin; scrollbar-color: var(--color-bg-secondary) var(--color-bg-primary); }
|
||||
/* Global selection + focus */
|
||||
::selection {
|
||||
background: var(--color-primary-light);
|
||||
color: var(--color-text-primary);
|
||||
}
|
||||
|
||||
/* Typography */
|
||||
/* Scrollbar — slightly wider, warmer thumb */
|
||||
::-webkit-scrollbar { width: 10px; height: 10px; }
|
||||
::-webkit-scrollbar-track { background: transparent; }
|
||||
::-webkit-scrollbar-thumb {
|
||||
background: var(--color-border-default);
|
||||
border-radius: var(--radius-sm);
|
||||
border: 2px solid var(--color-bg-primary);
|
||||
}
|
||||
::-webkit-scrollbar-thumb:hover { background: var(--color-border-strong); }
|
||||
* { scrollbar-width: thin; scrollbar-color: var(--color-border-default) transparent; }
|
||||
|
||||
/* Typography — editorial hierarchy */
|
||||
h1, h2, h3, h4, h5, h6 {
|
||||
font-family: 'Space Grotesk', sans-serif;
|
||||
font-family: var(--font-sans);
|
||||
color: var(--color-text-primary);
|
||||
line-height: var(--leading-tight);
|
||||
letter-spacing: -0.01em;
|
||||
}
|
||||
h1 { font-size: var(--text-2xl); font-weight: var(--font-weight-semibold); }
|
||||
h2 { font-size: var(--text-xl); font-weight: var(--font-weight-semibold); }
|
||||
h3 { font-size: var(--text-lg); font-weight: var(--font-weight-semibold); }
|
||||
h4 { font-size: var(--text-base); font-weight: var(--font-weight-semibold); }
|
||||
h5, h6 { font-size: var(--text-sm); font-weight: var(--font-weight-semibold); }
|
||||
h1 { font-size: var(--text-3xl); font-weight: var(--font-weight-medium); letter-spacing: -0.02em; }
|
||||
h2 { font-size: var(--text-2xl); font-weight: var(--font-weight-medium); letter-spacing: -0.015em; }
|
||||
h3 { font-size: var(--text-xl); font-weight: var(--font-weight-medium); letter-spacing: -0.01em; }
|
||||
h4 { font-size: var(--text-lg); font-weight: var(--font-weight-medium); letter-spacing: -0.005em; }
|
||||
h5 { font-size: var(--text-base);font-weight: var(--font-weight-semibold); }
|
||||
h6 {
|
||||
font-size: var(--text-xs); font-weight: var(--font-weight-semibold);
|
||||
text-transform: uppercase; letter-spacing: 0.12em;
|
||||
color: var(--color-text-muted);
|
||||
}
|
||||
|
||||
code, pre {
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
code, pre, kbd, .mono {
|
||||
font-family: var(--font-mono);
|
||||
}
|
||||
|
||||
kbd {
|
||||
display: inline-block;
|
||||
padding: 1px 5px;
|
||||
font-size: 0.75em;
|
||||
font-weight: var(--font-weight-medium);
|
||||
background: var(--color-bg-tertiary);
|
||||
border: 1px solid var(--color-border-default);
|
||||
border-radius: var(--radius-sm);
|
||||
color: var(--color-text-secondary);
|
||||
line-height: 1.4;
|
||||
}
|
||||
|
||||
a {
|
||||
color: var(--color-primary);
|
||||
text-decoration: none;
|
||||
transition: color var(--duration-fast) var(--ease-default);
|
||||
}
|
||||
a:hover {
|
||||
color: var(--color-primary-hover);
|
||||
}
|
||||
|
||||
/* Honor prefers-reduced-motion globally */
|
||||
@media (prefers-reduced-motion: reduce) {
|
||||
*, *::before, *::after {
|
||||
animation-duration: 0.01ms !important;
|
||||
animation-iteration-count: 1 !important;
|
||||
transition-duration: 0.01ms !important;
|
||||
scroll-behavior: auto !important;
|
||||
}
|
||||
}
|
||||
|
||||
/* Utility classes */
|
||||
|
||||
@@ -403,7 +403,7 @@ export default function Account() {
|
||||
|
||||
if (!authEnabled) {
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--narrow">
|
||||
<div className="empty-state">
|
||||
<div className="empty-state-icon"><i className="fas fa-user-gear" /></div>
|
||||
<h2 className="empty-state-title">Account unavailable</h2>
|
||||
@@ -418,7 +418,7 @@ export default function Account() {
|
||||
const visibleTabs = isLocal ? TABS : TABS.filter(t => t.id !== 'security')
|
||||
|
||||
return (
|
||||
<div className="page account-page">
|
||||
<div className="page page--narrow account-page">
|
||||
{/* Header */}
|
||||
<div className="page-header">
|
||||
<h1 className="page-title">Account</h1>
|
||||
|
||||
@@ -101,6 +101,7 @@ export default function AgentChat() {
|
||||
const messagesEndRef = useRef(null)
|
||||
const messagesRef = useRef(null)
|
||||
const textareaRef = useRef(null)
|
||||
const stickToBottomRef = useRef(true)
|
||||
const eventSourceRef = useRef(null)
|
||||
const messageIdCounter = useRef(0)
|
||||
const addMessageRef = useRef(addMessage)
|
||||
@@ -260,11 +261,31 @@ export default function AgentChat() {
|
||||
}
|
||||
}, [name, userId, addToast, nextId])
|
||||
|
||||
// Auto-scroll to bottom
|
||||
// Track whether the user is pinned to the bottom. If they scroll up
|
||||
// while a response is streaming, stop forcing them back down.
|
||||
useEffect(() => {
|
||||
const el = messagesRef.current
|
||||
if (!el) return
|
||||
const onScroll = () => {
|
||||
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
|
||||
stickToBottomRef.current = distanceFromBottom < 80
|
||||
}
|
||||
el.addEventListener('scroll', onScroll, { passive: true })
|
||||
return () => el.removeEventListener('scroll', onScroll)
|
||||
}, [])
|
||||
|
||||
// Auto-scroll only when the user hasn't scrolled away from the bottom.
|
||||
useEffect(() => {
|
||||
if (!stickToBottomRef.current) return
|
||||
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
|
||||
}, [messages, streamContent, streamReasoning, streamToolCalls])
|
||||
|
||||
// When switching conversations, snap to bottom and re-pin.
|
||||
useEffect(() => {
|
||||
stickToBottomRef.current = true
|
||||
messagesEndRef.current?.scrollIntoView({ behavior: 'auto' })
|
||||
}, [activeId])
|
||||
|
||||
// Highlight code blocks
|
||||
useEffect(() => {
|
||||
if (messagesRef.current) highlightAll(messagesRef.current)
|
||||
|
||||
@@ -97,7 +97,7 @@ function FormField({ field, value, onChange, disabled }) {
|
||||
rows={5}
|
||||
disabled={disabled}
|
||||
style={field.name.includes('prompt') || field.name.includes('template') || field.name.includes('script')
|
||||
? { fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' } : undefined}
|
||||
? { fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' } : undefined}
|
||||
/>
|
||||
</div>
|
||||
)
|
||||
@@ -624,7 +624,7 @@ export default function AgentCreate() {
|
||||
value={mcpRawJson}
|
||||
onChange={(e) => setMcpRawJson(e.target.value)}
|
||||
rows={16}
|
||||
style={{ fontFamily: 'monospace', fontSize: '0.85rem', whiteSpace: 'pre' }}
|
||||
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.85rem', whiteSpace: 'pre' }}
|
||||
placeholder={'{\n "mcpServers": {\n "my-server": {\n "command": "npx",\n "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],\n "env": {}\n }\n }\n}'}
|
||||
/>
|
||||
</div>
|
||||
@@ -812,14 +812,14 @@ export default function AgentCreate() {
|
||||
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
|
||||
<div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
|
||||
<i className="fas fa-spinner fa-spin" style={{ fontSize: '2rem', color: 'var(--color-primary)' }} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--narrow">
|
||||
<style>{`
|
||||
.agent-form-container {
|
||||
display: flex;
|
||||
|
||||
@@ -43,7 +43,7 @@ function TraceCard({ trace, index }) {
|
||||
{trace.type || 'unknown'}
|
||||
</span>
|
||||
{trace.tool_name && (
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem', color: 'var(--color-text-secondary)' }}>
|
||||
<span style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem', color: 'var(--color-text-secondary)' }}>
|
||||
{trace.tool_name}
|
||||
</span>
|
||||
)}
|
||||
@@ -60,7 +60,7 @@ function TraceCard({ trace, index }) {
|
||||
{trace.content && (
|
||||
<pre style={{
|
||||
whiteSpace: 'pre-wrap', wordBreak: 'break-word', margin: 0,
|
||||
fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem',
|
||||
fontFamily: 'var(--font-mono)', fontSize: '0.75rem',
|
||||
color: 'var(--color-text-secondary)', lineHeight: 1.6,
|
||||
}}>
|
||||
{trace.content}
|
||||
@@ -71,7 +71,7 @@ function TraceCard({ trace, index }) {
|
||||
<span style={{ fontSize: '0.6875rem', fontWeight: 600, color: 'var(--color-text-muted)' }}>Arguments:</span>
|
||||
<pre style={{
|
||||
whiteSpace: 'pre-wrap', wordBreak: 'break-word', margin: '4px 0 0',
|
||||
fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem',
|
||||
fontFamily: 'var(--font-mono)', fontSize: '0.75rem',
|
||||
color: 'var(--color-text-secondary)', lineHeight: 1.5,
|
||||
}}>
|
||||
{typeof trace.arguments === 'string' ? trace.arguments : JSON.stringify(trace.arguments, null, 2)}
|
||||
@@ -162,9 +162,9 @@ export default function AgentJobDetails() {
|
||||
return rendered
|
||||
}
|
||||
|
||||
if (loading) return <div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
|
||||
if (loading) return <div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
|
||||
if (!job) return (
|
||||
<div className="page">
|
||||
<div className="page page--narrow">
|
||||
<div className="empty-state">
|
||||
<div className="empty-state-icon"><i className="fas fa-search" /></div>
|
||||
<h2 className="empty-state-title">Job not found</h2>
|
||||
@@ -177,7 +177,7 @@ export default function AgentJobDetails() {
|
||||
const traces = Array.isArray(job.traces) ? job.traces : []
|
||||
|
||||
return (
|
||||
<div className="page" style={{ maxWidth: 900 }}>
|
||||
<div className="page page--narrow">
|
||||
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<div>
|
||||
<h1 className="page-title">Job Details</h1>
|
||||
@@ -207,7 +207,7 @@ export default function AgentJobDetails() {
|
||||
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 'var(--spacing-md)' }}>
|
||||
<div>
|
||||
<span className="form-label">Job ID</span>
|
||||
<p style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem', wordBreak: 'break-all' }}>{job.id}</p>
|
||||
<p style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem', wordBreak: 'break-all' }}>{job.id}</p>
|
||||
</div>
|
||||
<div>
|
||||
<span className="form-label">Task</span>
|
||||
@@ -264,7 +264,7 @@ export default function AgentJobDetails() {
|
||||
</h3>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)' }}>
|
||||
{Object.entries(job.cron_parameters).map(([k, v]) => (
|
||||
<span key={k} className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem' }}>
|
||||
<span key={k} className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem' }}>
|
||||
{k}={v}
|
||||
</span>
|
||||
))}
|
||||
@@ -281,7 +281,7 @@ export default function AgentJobDetails() {
|
||||
</h3>
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-xs)' }}>
|
||||
{Object.entries(job.parameters).map(([k, v]) => (
|
||||
<span key={k} className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.75rem' }}>
|
||||
<span key={k} className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.75rem' }}>
|
||||
{k}={v}
|
||||
</span>
|
||||
))}
|
||||
|
||||
@@ -213,7 +213,7 @@ export default function AgentJobs() {
|
||||
// Wizard: no models installed
|
||||
if (!loading && models.length === 0) {
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<div className="page-header">
|
||||
<h1 className="page-title">Agent Jobs</h1>
|
||||
<p className="page-subtitle">Manage agent tasks and automated workflows</p>
|
||||
@@ -240,7 +240,7 @@ export default function AgentJobs() {
|
||||
// Wizard: models but no MCP
|
||||
if (!loading && models.length > 0 && !hasMCPModels && tasks.length === 0) {
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<div className="page-header">
|
||||
<h1 className="page-title">Agent Jobs</h1>
|
||||
<p className="page-subtitle">Manage agent tasks and automated workflows</p>
|
||||
@@ -253,7 +253,7 @@ export default function AgentJobs() {
|
||||
</p>
|
||||
<div style={{ background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-md)', padding: 'var(--spacing-md)', maxWidth: 500, margin: '0 auto var(--spacing-md)', textAlign: 'left' }}>
|
||||
<p style={{ fontSize: '0.8125rem', fontWeight: 600, marginBottom: 'var(--spacing-xs)' }}>Example MCP configuration (YAML):</p>
|
||||
<pre style={{ fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-secondary)', whiteSpace: 'pre-wrap' }}>{`mcp:
|
||||
<pre style={{ fontSize: '0.75rem', fontFamily: 'var(--font-mono)', color: 'var(--color-text-secondary)', whiteSpace: 'pre-wrap' }}>{`mcp:
|
||||
stdio:
|
||||
- name: my-tool
|
||||
command: /path/to/tool
|
||||
@@ -273,7 +273,7 @@ export default function AgentJobs() {
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<div>
|
||||
<h1 className="page-title">Agent Jobs</h1>
|
||||
@@ -345,7 +345,7 @@ export default function AgentJobs() {
|
||||
</td>
|
||||
<td>
|
||||
{task.cron ? (
|
||||
<span className="badge badge-info" style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.6875rem' }}>
|
||||
<span className="badge badge-info" style={{ fontFamily: 'var(--font-mono)', fontSize: '0.6875rem' }}>
|
||||
{task.cron}
|
||||
</span>
|
||||
) : '-'}
|
||||
@@ -426,7 +426,7 @@ export default function AgentJobs() {
|
||||
{filteredJobs.map(job => (
|
||||
<tr key={job.id}>
|
||||
<td>
|
||||
<a onClick={() => navigate(`/app/agent-jobs/jobs/${job.id}`)} style={{ cursor: 'pointer', color: 'var(--color-primary)', fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>
|
||||
<a onClick={() => navigate(`/app/agent-jobs/jobs/${job.id}`)} style={{ cursor: 'pointer', color: 'var(--color-primary)', fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
|
||||
{job.id?.slice(0, 12)}...
|
||||
</a>
|
||||
</td>
|
||||
@@ -510,7 +510,7 @@ export default function AgentJobs() {
|
||||
<tbody>
|
||||
{(items || []).map(job => (
|
||||
<tr key={job.id}>
|
||||
<td style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>{job.id?.slice(0, 12)}...</td>
|
||||
<td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>{job.id?.slice(0, 12)}...</td>
|
||||
<td>{job.task_id || '-'}</td>
|
||||
<td>{statusBadge(job.status)}</td>
|
||||
<td style={{ fontSize: '0.8125rem', color: 'var(--color-text-secondary)' }}>{formatDate(job.created_at)}</td>
|
||||
@@ -566,7 +566,7 @@ export default function AgentJobs() {
|
||||
onChange={(e) => setExecuteParams(e.target.value)}
|
||||
rows={5}
|
||||
placeholder={`topic=AI trends\nformat=markdown`}
|
||||
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
|
||||
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
|
||||
These will be available as {'{{.parameter_name}}'} in the prompt template.
|
||||
@@ -590,7 +590,7 @@ export default function AgentJobs() {
|
||||
{executeMultimedia[type].map((item, i) => (
|
||||
<div key={i} style={{
|
||||
display: 'flex', alignItems: 'center', justifyContent: 'space-between',
|
||||
background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-sm)', padding: '4px 8px', fontSize: '0.75rem',
|
||||
background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-sm)', padding: 'var(--spacing-xs) var(--spacing-sm)', fontSize: '0.75rem',
|
||||
}}>
|
||||
<span style={{ overflow: 'hidden', textOverflow: 'ellipsis', whiteSpace: 'nowrap' }}>{item.name || item.url?.slice(0, 40)}</span>
|
||||
<button onClick={() => removeMultimedia(type, i)} style={{ background: 'none', border: 'none', color: 'var(--color-error)', cursor: 'pointer', padding: '2px 4px' }}>
|
||||
|
||||
@@ -260,7 +260,7 @@ export default function AgentStatus() {
|
||||
const tree = buildTree(observables)
|
||||
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<style>{`
|
||||
.as-card {
|
||||
background: var(--color-bg-secondary);
|
||||
@@ -294,7 +294,7 @@ export default function AgentStatus() {
|
||||
.as-id {
|
||||
font-size: 0.6875rem;
|
||||
color: var(--color-text-muted);
|
||||
font-family: 'JetBrains Mono', monospace;
|
||||
font-family: var(--font-mono);
|
||||
}
|
||||
.as-summary-item {
|
||||
display: flex; align-items: center; gap: 6px;
|
||||
@@ -303,7 +303,7 @@ export default function AgentStatus() {
|
||||
}
|
||||
.as-summary-item i { font-size: 0.625rem; flex-shrink: 0; }
|
||||
.as-summary-creation i { color: var(--color-primary); }
|
||||
.as-summary-tool-call i { color: #f59e0b; }
|
||||
.as-summary-tool-call i { color: var(--color-warning); }
|
||||
.as-summary-completion i { color: var(--color-success); }
|
||||
.as-summary-error i { color: var(--color-error); }
|
||||
.as-card-body {
|
||||
@@ -327,13 +327,13 @@ export default function AgentStatus() {
|
||||
background: var(--color-bg-tertiary); color: var(--color-text-muted);
|
||||
margin-right: 4px; vertical-align: middle;
|
||||
}
|
||||
.as-tag-error { background: var(--color-error); color: #fff; }
|
||||
.as-tag-error { background: var(--color-error); color: var(--color-text-inverse); }
|
||||
.as-error-text { color: var(--color-error); }
|
||||
.as-raw { margin-top: var(--spacing-sm); }
|
||||
.as-raw summary { font-size: 0.75rem; color: var(--color-text-muted); cursor: pointer; }
|
||||
.as-json {
|
||||
background: var(--color-bg-tertiary); border-radius: var(--radius-sm);
|
||||
padding: var(--spacing-sm); font-family: 'JetBrains Mono', monospace;
|
||||
padding: var(--spacing-sm); font-family: var(--font-mono);
|
||||
font-size: 0.75rem; overflow-x: auto; white-space: pre-wrap;
|
||||
word-break: break-word; max-height: 300px; overflow-y: auto;
|
||||
}
|
||||
|
||||
@@ -159,12 +159,12 @@ export default function AgentTaskDetails() {
|
||||
|
||||
const formatDate = (d) => d ? new Date(d).toLocaleString() : '-'
|
||||
|
||||
if (loading) return <div className="page" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
|
||||
if (loading) return <div className="page page--narrow" style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}><LoadingSpinner size="lg" /></div>
|
||||
|
||||
// View mode
|
||||
if (!isNew && !isEdit) {
|
||||
return (
|
||||
<div className="page" style={{ maxWidth: 900 }}>
|
||||
<div className="page page--narrow">
|
||||
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<div>
|
||||
<h1 className="page-title">{task.name || 'Task Details'}</h1>
|
||||
@@ -198,7 +198,7 @@ export default function AgentTaskDetails() {
|
||||
{task.cron && (
|
||||
<div>
|
||||
<span className="form-label">Cron Schedule</span>
|
||||
<p style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>{task.cron}</p>
|
||||
<p style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>{task.cron}</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
@@ -229,13 +229,13 @@ export default function AgentTaskDetails() {
|
||||
<div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
|
||||
<div>
|
||||
<span className="form-label">Execute by name</span>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
{`curl -X POST ${window.location.origin}${basePath}/api/agent/tasks/${encodeURIComponent(task.name)}/execute`}
|
||||
</pre>
|
||||
</div>
|
||||
<div>
|
||||
<span className="form-label">Execute with multimedia</span>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
{`curl -X POST ${window.location.origin}${basePath}/api/agent/tasks/${encodeURIComponent(task.name)}/execute \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{"multimedia": {"images": [{"url": "https://example.com/image.jpg"}]}}'`}
|
||||
@@ -243,7 +243,7 @@ export default function AgentTaskDetails() {
|
||||
</div>
|
||||
<div>
|
||||
<span className="form-label">Check job status</span>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
<pre style={{ background: 'var(--color-bg-primary)', padding: 'var(--spacing-sm)', borderRadius: 'var(--radius-md)', fontSize: '0.75rem', fontFamily: 'var(--font-mono)', whiteSpace: 'pre-wrap', overflow: 'auto' }}>
|
||||
{`curl ${window.location.origin}${basePath}/api/agent/jobs/<job-id>`}
|
||||
</pre>
|
||||
</div>
|
||||
@@ -261,7 +261,7 @@ export default function AgentTaskDetails() {
|
||||
<div key={i} style={{ background: 'var(--color-bg-primary)', borderRadius: 'var(--radius-md)', padding: 'var(--spacing-sm)', marginBottom: 'var(--spacing-sm)' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-sm)', fontSize: '0.8125rem' }}>
|
||||
<span className="badge badge-info">{wh.method || 'POST'}</span>
|
||||
<span style={{ fontFamily: "'JetBrains Mono', monospace" }}>{wh.url}</span>
|
||||
<span style={{ fontFamily: 'var(--font-mono)' }}>{wh.url}</span>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
@@ -283,7 +283,7 @@ export default function AgentTaskDetails() {
|
||||
<tbody>
|
||||
{jobHistory.map(job => (
|
||||
<tr key={job.id}>
|
||||
<td style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}>
|
||||
<td style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}>
|
||||
{job.id?.slice(0, 12)}...
|
||||
</td>
|
||||
<td>{statusBadge(job.status)}</td>
|
||||
@@ -306,7 +306,7 @@ export default function AgentTaskDetails() {
|
||||
|
||||
// Edit/Create form
|
||||
return (
|
||||
<div className="page" style={{ maxWidth: 900 }}>
|
||||
<div className="page page--narrow">
|
||||
<div className="page-header" style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
|
||||
<h1 className="page-title">{isNew ? 'Create Task' : 'Edit Task'}</h1>
|
||||
<button className="btn btn-secondary btn-sm" onClick={() => navigate('/app/agent-jobs')}>
|
||||
@@ -351,7 +351,7 @@ export default function AgentTaskDetails() {
|
||||
onChange={(e) => updateField('prompt', e.target.value)}
|
||||
rows={8}
|
||||
placeholder={`Write a summary about {{.topic}} in {{.format}} format.`}
|
||||
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
|
||||
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
|
||||
Use {'{{.parameter_name}}'} for dynamic parameters. Parameters are provided when executing the task.
|
||||
@@ -376,7 +376,7 @@ export default function AgentTaskDetails() {
|
||||
value={task.cron}
|
||||
onChange={(e) => { updateField('cron', e.target.value); validateCron(e.target.value) }}
|
||||
placeholder="0 */6 * * *"
|
||||
style={{ fontFamily: "'JetBrains Mono', monospace" }}
|
||||
style={{ fontFamily: 'var(--font-mono)' }}
|
||||
/>
|
||||
{cronError && <p style={{ color: 'var(--color-error)', fontSize: '0.75rem', marginTop: 4 }}>{cronError}</p>}
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
|
||||
@@ -392,7 +392,7 @@ export default function AgentTaskDetails() {
|
||||
onChange={(e) => updateField('cron_parameters', e.target.value)}
|
||||
rows={3}
|
||||
placeholder={`topic=daily news\nformat=bullet points`}
|
||||
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
|
||||
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 'var(--spacing-xs)' }}>
|
||||
Default parameters used when the cron triggers the task.
|
||||
@@ -437,7 +437,7 @@ export default function AgentTaskDetails() {
|
||||
</div>
|
||||
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
|
||||
<label className="form-label">Headers (JSON)</label>
|
||||
<input className="input" value={ms.headers} onChange={(e) => updateMultimediaSource(i, 'headers', e.target.value)} placeholder='{"Authorization": "Bearer ..."}' style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }} />
|
||||
<input className="input" value={ms.headers} onChange={(e) => updateMultimediaSource(i, 'headers', e.target.value)} placeholder='{"Authorization": "Bearer ..."}' style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }} />
|
||||
</div>
|
||||
</div>
|
||||
))
|
||||
@@ -479,7 +479,7 @@ export default function AgentTaskDetails() {
|
||||
</div>
|
||||
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
|
||||
<label className="form-label">Headers (JSON)</label>
|
||||
<input className="input" value={wh.headers} onChange={(e) => updateWebhook(i, 'headers', e.target.value)} placeholder='{"Content-Type": "application/json"}' style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }} />
|
||||
<input className="input" value={wh.headers} onChange={(e) => updateWebhook(i, 'headers', e.target.value)} placeholder='{"Content-Type": "application/json"}' style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }} />
|
||||
</div>
|
||||
<div className="form-group" style={{ marginTop: 'var(--spacing-xs)' }}>
|
||||
<label className="form-label">Payload Template (Go template syntax)</label>
|
||||
@@ -489,7 +489,7 @@ export default function AgentTaskDetails() {
|
||||
onChange={(e) => updateWebhook(i, 'payload_template', e.target.value)}
|
||||
rows={3}
|
||||
placeholder={`{"text": "Job {{.Status}}: {{if .Error}}Error: {{.Error}}{{else}}{{.Result}}{{end}}"}`}
|
||||
style={{ fontFamily: "'JetBrains Mono', monospace", fontSize: '0.8125rem' }}
|
||||
style={{ fontFamily: 'var(--font-mono)', fontSize: '0.8125rem' }}
|
||||
/>
|
||||
<p style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)', marginTop: 2 }}>
|
||||
Available: {'{{.Job}}'} {'{{.Task}}'} {'{{.Result}}'} {'{{.Error}}'} {'{{.Status}}'}
|
||||
|
||||
@@ -136,7 +136,7 @@ export default function Agents() {
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<style>{`
|
||||
.agents-import-input { display: none; }
|
||||
.agents-toolbar {
|
||||
|
||||
@@ -149,7 +149,7 @@ function BackendLogsDetail({ modelId }) {
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<div className="page-header">
|
||||
<div>
|
||||
<h1 className="page-title" style={{ marginBottom: 0 }}>
|
||||
@@ -229,7 +229,7 @@ function BackendLogsDetail({ modelId }) {
|
||||
borderRadius: 'var(--radius-md)',
|
||||
overflow: 'auto',
|
||||
maxHeight: 'calc(100vh - 280px)',
|
||||
fontFamily: 'JetBrains Mono, Consolas, monospace',
|
||||
fontFamily: 'var(--font-mono)',
|
||||
fontSize: '0.75rem',
|
||||
lineHeight: '1.5',
|
||||
}}
|
||||
@@ -283,7 +283,7 @@ export default function BackendLogs() {
|
||||
|
||||
// No model specified — redirect to System page
|
||||
return (
|
||||
<div className="page">
|
||||
<div className="page page--wide">
|
||||
<div className="empty-state">
|
||||
<div className="empty-state-icon"><i className="fas fa-terminal" /></div>
|
||||
<h2 className="empty-state-title">No model selected</h2>
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user