mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-22 07:38:26 -04:00
Compare commits
45 Commits
distribute
...
issue-9478
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
798b5b2d84 | ||
|
|
cd7b035716 | ||
|
|
0f3bb2d647 | ||
|
|
607efe5a4c | ||
|
|
7d8c1d5e45 | ||
|
|
d18d434bb2 | ||
|
|
39573ecd2a | ||
|
|
a7dbb2a83d | ||
|
|
3ad9b16c29 | ||
|
|
c806d5ab73 | ||
|
|
47efaf5b43 | ||
|
|
315b634a91 | ||
|
|
6b245299d7 | ||
|
|
677c0315c1 | ||
|
|
478522ce4d | ||
|
|
c54897ad44 | ||
|
|
8bb1e8f21f | ||
|
|
cd94a0b61a | ||
|
|
047bc48fa9 | ||
|
|
01bd8ae5d0 | ||
|
|
d9808769be | ||
|
|
5973c0a9df | ||
|
|
486b5e25a3 | ||
|
|
c66c41e8d7 | ||
|
|
02bb715c0a | ||
|
|
8ab56e2ad3 | ||
|
|
ecf85fde9e | ||
|
|
6480715a16 | ||
|
|
f683231811 | ||
|
|
960757f0e8 | ||
|
|
865fd552f5 | ||
|
|
cb77a5a4b9 | ||
|
|
60633c4dd5 | ||
|
|
9e44944cc1 | ||
|
|
372eb08dcf | ||
|
|
28091d626e | ||
|
|
cae79d9107 | ||
|
|
babbbc6ec8 | ||
|
|
3804497186 | ||
|
|
fda1c553a1 | ||
|
|
b27de08fff | ||
|
|
510f791ccc | ||
|
|
369c50a41c | ||
|
|
75a63f87d8 | ||
|
|
9cd8d7951f |
101
.agents/ai-coding-assistants.md
Normal file
101
.agents/ai-coding-assistants.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# AI Coding Assistants
|
||||
|
||||
This document provides guidance for AI tools and developers using AI
|
||||
assistance when contributing to LocalAI.
|
||||
|
||||
**LocalAI follows the same guidelines as the Linux kernel project for
|
||||
AI-assisted contributions.** See the upstream policy here:
|
||||
<https://docs.kernel.org/process/coding-assistants.html>
|
||||
|
||||
The rules below mirror that policy, adapted to LocalAI's license and
|
||||
project layout. If anything is unclear, the kernel document is the
|
||||
authoritative reference for intent.
|
||||
|
||||
AI tools helping with LocalAI development should follow the standard
|
||||
project development process:
|
||||
|
||||
- [CONTRIBUTING.md](../CONTRIBUTING.md) — development workflow, commit
|
||||
conventions, and PR guidelines
|
||||
- [.agents/coding-style.md](coding-style.md) — code style, editorconfig,
|
||||
logging, and documentation conventions
|
||||
- [.agents/building-and-testing.md](building-and-testing.md) — build and
|
||||
test procedures
|
||||
|
||||
## Licensing and Legal Requirements
|
||||
|
||||
All contributions must comply with LocalAI's licensing requirements:
|
||||
|
||||
- LocalAI is licensed under the **MIT License** — see the [LICENSE](../LICENSE)
|
||||
file
|
||||
- New source files should use the SPDX license identifier `MIT` where
|
||||
applicable to the file type
|
||||
- Contributions must be compatible with the MIT License and must not
|
||||
introduce code under incompatible licenses (e.g., GPL) without an
|
||||
explicit discussion with maintainers
|
||||
|
||||
## Signed-off-by and Developer Certificate of Origin
|
||||
|
||||
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally
|
||||
certify the Developer Certificate of Origin (DCO). The human submitter
|
||||
is responsible for:
|
||||
|
||||
- Reviewing all AI-generated code
|
||||
- Ensuring compliance with licensing requirements
|
||||
- Adding their own `Signed-off-by` tag (when the project requires DCO)
|
||||
to certify the contribution
|
||||
- Taking full responsibility for the contribution
|
||||
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either.
|
||||
A human reviewer owns the contribution; the AI's involvement is recorded
|
||||
via `Assisted-by` (see below).
|
||||
|
||||
## Attribution
|
||||
|
||||
When AI tools contribute to LocalAI development, proper attribution helps
|
||||
track the evolving role of AI in the development process. Contributions
|
||||
should include an `Assisted-by` tag in the commit message trailer in the
|
||||
following format:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`,
|
||||
`Copilot`, `Cursor`)
|
||||
- `MODEL_VERSION` — specific model version used (e.g.,
|
||||
`claude-opus-4-7`, `gpt-5`)
|
||||
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the
|
||||
agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
|
||||
|
||||
Basic development tools (git, go, make, editors) should **not** be listed.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
fix(llama-cpp): handle empty tool call arguments
|
||||
|
||||
Previously the parser panicked when the model returned a tool call with
|
||||
an empty arguments object. Fall back to an empty JSON object in that
|
||||
case so downstream consumers receive a valid payload.
|
||||
|
||||
Assisted-by: Claude:claude-opus-4-7 golangci-lint
|
||||
Signed-off-by: Jane Developer <jane@example.com>
|
||||
```
|
||||
|
||||
## Scope and Responsibility
|
||||
|
||||
Using an AI assistant does not reduce the contributor's responsibility.
|
||||
The human submitter must:
|
||||
|
||||
- Understand every line that lands in the PR
|
||||
- Verify that generated code compiles, passes tests, and follows the
|
||||
project style
|
||||
- Confirm that any referenced APIs, flags, or file paths actually exist
|
||||
in the current tree (AI models may hallucinate identifiers)
|
||||
- Not submit AI output verbatim without review
|
||||
|
||||
Reviewers may ask for clarification on any change regardless of how it
|
||||
was produced. "An AI wrote it" is not an acceptable answer to a design
|
||||
question.
|
||||
14
.github/workflows/backend.yml
vendored
14
.github/workflows/backend.yml
vendored
@@ -30,6 +30,7 @@ jobs:
|
||||
skip-drivers: ${{ matrix.skip-drivers }}
|
||||
context: ${{ matrix.context }}
|
||||
ubuntu-version: ${{ matrix.ubuntu-version }}
|
||||
amdgpu-targets: ${{ matrix.amdgpu-targets }}
|
||||
secrets:
|
||||
dockerUsername: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
dockerPassword: ${{ secrets.DOCKERHUB_PASSWORD }}
|
||||
@@ -1623,19 +1624,6 @@ jobs:
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
platforms: 'linux/amd64'
|
||||
tag-latest: 'auto'
|
||||
tag-suffix: '-gpu-rocm-hipblas-whisperx'
|
||||
runs-on: 'bigger-runner'
|
||||
base-image: "rocm/dev-ubuntu-24.04:7.2.1"
|
||||
skip-drivers: 'false'
|
||||
backend: "whisperx"
|
||||
dockerfile: "./backend/Dockerfile.python"
|
||||
context: "./"
|
||||
ubuntu-version: '2404'
|
||||
- build-type: 'hipblas'
|
||||
cuda-major-version: ""
|
||||
cuda-minor-version: ""
|
||||
|
||||
7
.github/workflows/backend_build.yml
vendored
7
.github/workflows/backend_build.yml
vendored
@@ -58,6 +58,11 @@ on:
|
||||
required: false
|
||||
default: '2204'
|
||||
type: string
|
||||
amdgpu-targets:
|
||||
description: 'AMD GPU targets for ROCm/HIP builds'
|
||||
required: false
|
||||
default: 'gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201'
|
||||
type: string
|
||||
secrets:
|
||||
dockerUsername:
|
||||
required: false
|
||||
@@ -214,6 +219,7 @@ jobs:
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
@@ -235,6 +241,7 @@ jobs:
|
||||
BASE_IMAGE=${{ inputs.base-image }}
|
||||
BACKEND=${{ inputs.backend }}
|
||||
UBUNTU_VERSION=${{ inputs.ubuntu-version }}
|
||||
AMDGPU_TARGETS=${{ inputs.amdgpu-targets }}
|
||||
context: ${{ inputs.context }}
|
||||
file: ${{ inputs.dockerfile }}
|
||||
cache-from: type=gha
|
||||
|
||||
27
.github/workflows/gallery-agent.yaml
vendored
27
.github/workflows/gallery-agent.yaml
vendored
@@ -54,24 +54,41 @@ jobs:
|
||||
REPO: ${{ github.repository }}
|
||||
SEARCH: 'gallery agent in:title'
|
||||
run: |
|
||||
# Walk open gallery-agent PRs and act on maintainer comments:
|
||||
# Walk gallery-agent PRs and act on maintainer comments:
|
||||
# /gallery-agent blacklist → label `gallery-agent/blacklisted` + close (never repropose)
|
||||
# /gallery-agent recreate → close without label (next run may repropose)
|
||||
# Only comments from OWNER / MEMBER / COLLABORATOR are honored so
|
||||
# random users can't drive the bot.
|
||||
#
|
||||
# We scan both open PRs AND recently-closed PRs that don't already
|
||||
# carry the blacklist label. This covers the common flow where a
|
||||
# maintainer writes /gallery-agent blacklist and immediately clicks
|
||||
# Close — without this, the next scheduled run wouldn't see the
|
||||
# command (PR is already closed) and would repropose the model.
|
||||
gh label create gallery-agent/blacklisted \
|
||||
--repo "$REPO" --color ededed \
|
||||
--description "gallery-agent must not repropose this model" 2>/dev/null || true
|
||||
|
||||
prs=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" --json number --jq '.[].number')
|
||||
prs_open=$(gh pr list --repo "$REPO" --state open --search "$SEARCH" \
|
||||
--json number --jq '.[].number')
|
||||
# Closed PRs from the last 14 days that don't yet have the blacklist label.
|
||||
# Bounded window keeps the scan cheap while covering late-applied commands.
|
||||
since=$(date -u -d '14 days ago' +%Y-%m-%d)
|
||||
prs_closed=$(gh pr list --repo "$REPO" --state closed \
|
||||
--search "$SEARCH closed:>=$since -label:gallery-agent/blacklisted" \
|
||||
--json number --jq '.[].number')
|
||||
prs=$(printf '%s\n%s\n' "$prs_open" "$prs_closed" | sort -u | sed '/^$/d')
|
||||
for pr in $prs; do
|
||||
state=$(gh pr view "$pr" --repo "$REPO" --json state --jq '.state')
|
||||
cmds=$(gh pr view "$pr" --repo "$REPO" --json comments \
|
||||
--jq '.comments[] | select(.authorAssociation=="OWNER" or .authorAssociation=="MEMBER" or .authorAssociation=="COLLABORATOR") | .body')
|
||||
if echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+blacklist([[:space:]]|$)'; then
|
||||
echo "PR #$pr: blacklist command found"
|
||||
echo "PR #$pr: blacklist command found (state=$state)"
|
||||
gh pr edit "$pr" --repo "$REPO" --add-label gallery-agent/blacklisted || true
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
|
||||
elif echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
|
||||
if [ "$state" = "OPEN" ]; then
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Blacklisted via \`/gallery-agent blacklist\`. This model will not be reproposed." || true
|
||||
fi
|
||||
elif [ "$state" = "OPEN" ] && echo "$cmds" | grep -qE '(^|[[:space:]])/gallery-agent[[:space:]]+recreate([[:space:]]|$)'; then
|
||||
echo "PR #$pr: recreate command found"
|
||||
gh pr close "$pr" --repo "$REPO" --comment "Closed via \`/gallery-agent recreate\`. The next scheduled run will propose this model again." || true
|
||||
fi
|
||||
|
||||
14
AGENTS.md
14
AGENTS.md
@@ -1,11 +1,23 @@
|
||||
# LocalAI Agent Instructions
|
||||
|
||||
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
|
||||
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
|
||||
|
||||
Human contributors: see [CONTRIBUTING.md](CONTRIBUTING.md) for the development workflow.
|
||||
|
||||
## Policy for AI-Assisted Contributions
|
||||
|
||||
LocalAI follows the Linux kernel project's [guidelines for AI coding assistants](https://docs.kernel.org/process/coding-assistants.html). Before submitting AI-assisted code, read [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md). Key rules:
|
||||
|
||||
- **No `Signed-off-by` from AI.** Only the human submitter may sign off on the Developer Certificate of Origin.
|
||||
- **No `Co-Authored-By: <AI>` trailers.** The human contributor owns the change.
|
||||
- **Use an `Assisted-by:` trailer** to attribute AI involvement. Format: `Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]`.
|
||||
- **The human submitter is responsible** for reviewing, testing, and understanding every line of generated code.
|
||||
|
||||
## Topics
|
||||
|
||||
| File | When to read |
|
||||
|------|-------------|
|
||||
| [.agents/ai-coding-assistants.md](.agents/ai-coding-assistants.md) | Policy for AI-assisted contributions — licensing, DCO, attribution |
|
||||
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
||||
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
|
||||
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
||||
|
||||
@@ -13,6 +13,7 @@ Thank you for your interest in contributing to LocalAI! We appreciate your time
|
||||
- [Development Workflow](#development-workflow)
|
||||
- [Creating a Pull Request (PR)](#creating-a-pull-request-pr)
|
||||
- [Coding Guidelines](#coding-guidelines)
|
||||
- [AI Coding Assistants](#ai-coding-assistants)
|
||||
- [Testing](#testing)
|
||||
- [Documentation](#documentation)
|
||||
- [Community and Communication](#community-and-communication)
|
||||
@@ -185,7 +186,7 @@ Before jumping into a PR for a massive feature or big change, it is preferred to
|
||||
|
||||
This project uses an [`.editorconfig`](.editorconfig) file to define formatting standards (indentation, line endings, charset, etc.). Please configure your editor to respect it.
|
||||
|
||||
For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific guidelines including build instructions and backend architecture details.
|
||||
For AI-assisted development, see [`AGENTS.md`](AGENTS.md) (or the equivalent [`CLAUDE.md`](CLAUDE.md) symlink) for agent-specific guidelines including build instructions and backend architecture details. Contributions produced with AI assistance must follow the rules in the [AI Coding Assistants](#ai-coding-assistants) section below.
|
||||
|
||||
### General Principles
|
||||
|
||||
@@ -211,6 +212,26 @@ For AI-assisted development, see [`CLAUDE.md`](CLAUDE.md) for agent-specific gui
|
||||
- Reviewers will check for correctness, test coverage, adherence to these guidelines, and clarity of intent.
|
||||
- Be responsive to review feedback and keep discussions constructive.
|
||||
|
||||
## AI Coding Assistants
|
||||
|
||||
LocalAI follows the **same guidelines as the Linux kernel project** for AI-assisted contributions: <https://docs.kernel.org/process/coding-assistants.html>.
|
||||
|
||||
The full policy for this repository lives in [`.agents/ai-coding-assistants.md`](.agents/ai-coding-assistants.md). Summary:
|
||||
|
||||
- **AI agents MUST NOT add `Signed-off-by` tags.** Only humans can certify the Developer Certificate of Origin.
|
||||
- **AI agents MUST NOT add `Co-Authored-By` trailers** attributing themselves as co-authors.
|
||||
- **Attribute AI involvement with an `Assisted-by` trailer** in the commit message:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Example: `Assisted-by: Claude:claude-opus-4-7 golangci-lint`
|
||||
|
||||
Basic development tools (git, go, make, editors) should not be listed.
|
||||
- **The human submitter is responsible** for reviewing, testing, and fully understanding every line of AI-generated code — including verifying that any referenced APIs, flags, or file paths actually exist in the tree.
|
||||
- Contributions must remain compatible with LocalAI's **MIT License**.
|
||||
|
||||
## Testing
|
||||
|
||||
All new features and bug fixes should include test coverage. The project uses [Ginkgo](https://onsi.github.io/ginkgo/) as its test framework.
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
IK_LLAMA_VERSION?=8befd92ea5f702494ea9813fe42a52fb015db5fe
|
||||
IK_LLAMA_VERSION?=d4824131580b94ffa7b0e91c955e2b237c2fe16e
|
||||
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
|
||||
LLAMA_VERSION?=4f02d4733934179386cbc15b3454be26237940bb
|
||||
LLAMA_VERSION?=5a4cd6741fc33227cdacb329f355ab21f8481de2
|
||||
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
|
||||
# Pinned to the HEAD of feature/turboquant-kv-cache on https://github.com/TheTom/llama-cpp-turboquant.
|
||||
# Auto-bumped nightly by .github/workflows/bump_deps.yaml.
|
||||
TURBOQUANT_VERSION?=45f8a066ed5f5bb38c695cec532f6cef9f4efa9d
|
||||
TURBOQUANT_VERSION?=4d24ad87b8ed2ad160809af41930f1e04b83f234
|
||||
LLAMA_REPO?=https://github.com/TheTom/llama-cpp-turboquant
|
||||
|
||||
CMAKE_ARGS?=
|
||||
|
||||
@@ -1,13 +1,22 @@
|
||||
#!/bin/bash
|
||||
# Augment the shared backend/cpp/llama-cpp/grpc-server.cpp allow-list of KV-cache
|
||||
# types so the gRPC `LoadModel` call accepts the TurboQuant-specific
|
||||
# `turbo2` / `turbo3` / `turbo4` cache types.
|
||||
# Patch the shared backend/cpp/llama-cpp/grpc-server.cpp *copy* used by the
|
||||
# turboquant build to account for two gaps between upstream and the fork:
|
||||
#
|
||||
# We do this on the *copy* sitting in turboquant-<flavor>-build/, never on the
|
||||
# original under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps
|
||||
# compiling against vanilla upstream which does not know about GGML_TYPE_TURBO*.
|
||||
# 1. Augment the kv_cache_types[] allow-list so `LoadModel` accepts the
|
||||
# fork-specific `turbo2` / `turbo3` / `turbo4` cache types.
|
||||
# 2. Replace `get_media_marker()` (added upstream in ggml-org/llama.cpp#21962,
|
||||
# server-side random per-instance marker) with the legacy "<__media__>"
|
||||
# literal. The fork branched before that PR, so server-common.cpp has no
|
||||
# get_media_marker symbol. The fork's mtmd_default_marker() still returns
|
||||
# "<__media__>", and Go-side tooling falls back to that sentinel when the
|
||||
# backend does not expose media_marker, so substituting the literal keeps
|
||||
# behavior identical on the turboquant path.
|
||||
#
|
||||
# Idempotent: skips the insertion if the marker is already present (so re-runs
|
||||
# We patch the *copy* sitting in turboquant-<flavor>-build/, never the original
|
||||
# under backend/cpp/llama-cpp/, so the stock llama-cpp build keeps compiling
|
||||
# against vanilla upstream.
|
||||
#
|
||||
# Idempotent: skips each insertion if its marker is already present (so re-runs
|
||||
# of the same build dir don't double-insert).
|
||||
|
||||
set -euo pipefail
|
||||
@@ -25,33 +34,47 @@ if [[ ! -f "$SRC" ]]; then
|
||||
fi
|
||||
|
||||
if grep -q 'GGML_TYPE_TURBO2_0' "$SRC"; then
|
||||
echo "==> $SRC already has TurboQuant cache types, skipping"
|
||||
exit 0
|
||||
echo "==> $SRC already has TurboQuant cache types, skipping KV allow-list patch"
|
||||
else
|
||||
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
|
||||
|
||||
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
|
||||
# line (the kv_cache_types[] allow-list). Using awk because the builder image
|
||||
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
|
||||
awk '
|
||||
/^ GGML_TYPE_Q5_1,$/ && !done {
|
||||
print
|
||||
print " // turboquant fork extras — added by patch-grpc-server.sh"
|
||||
print " GGML_TYPE_TURBO2_0,"
|
||||
print " GGML_TYPE_TURBO3_0,"
|
||||
print " GGML_TYPE_TURBO4_0,"
|
||||
done = 1
|
||||
next
|
||||
}
|
||||
{ print }
|
||||
END {
|
||||
if (!done) {
|
||||
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
|
||||
echo "==> KV allow-list patch OK"
|
||||
fi
|
||||
|
||||
echo "==> patching $SRC to allow turbo2/turbo3/turbo4 KV-cache types"
|
||||
if grep -q 'get_media_marker()' "$SRC"; then
|
||||
echo "==> patching $SRC to replace get_media_marker() with legacy \"<__media__>\" literal"
|
||||
# Only one call site today (ModelMetadata), but replace all occurrences to
|
||||
# stay robust if upstream adds more. Use a temp file to avoid relying on
|
||||
# sed -i portability (the builder image uses GNU sed, but keeping this
|
||||
# consistent with the awk block above).
|
||||
sed 's/get_media_marker()/"<__media__>"/g' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
echo "==> get_media_marker() substitution OK"
|
||||
else
|
||||
echo "==> $SRC has no get_media_marker() call, skipping media-marker patch"
|
||||
fi
|
||||
|
||||
# Insert the three TURBO entries right after the first ` GGML_TYPE_Q5_1,`
|
||||
# line (the kv_cache_types[] allow-list). Using awk because the builder image
|
||||
# does not ship python3, and GNU sed's multi-line `a\` quoting is awkward.
|
||||
awk '
|
||||
/^ GGML_TYPE_Q5_1,$/ && !done {
|
||||
print
|
||||
print " // turboquant fork extras — added by patch-grpc-server.sh"
|
||||
print " GGML_TYPE_TURBO2_0,"
|
||||
print " GGML_TYPE_TURBO3_0,"
|
||||
print " GGML_TYPE_TURBO4_0,"
|
||||
done = 1
|
||||
next
|
||||
}
|
||||
{ print }
|
||||
END {
|
||||
if (!done) {
|
||||
print "patch-grpc-server.sh: anchor ` GGML_TYPE_Q5_1,` not found" > "/dev/stderr"
|
||||
exit 1
|
||||
}
|
||||
}
|
||||
' "$SRC" > "$SRC.tmp"
|
||||
mv "$SRC.tmp" "$SRC"
|
||||
|
||||
echo "==> patched OK"
|
||||
echo "==> all patches applied"
|
||||
|
||||
@@ -0,0 +1,47 @@
|
||||
From: LocalAI turboquant backend maintainers <noreply@localai.io>
|
||||
Subject: ggml-hip: add F16-K + TURBO-V fattn-vec template instances
|
||||
|
||||
Upstream commit fa4e8be0a0ce ("fix(cuda): add F16-K + TURBO-V dispatch cases
|
||||
in fattn.cu") added three new template instance files under ggml-cuda/:
|
||||
|
||||
- fattn-vec-instance-f16-turbo2_0.cu
|
||||
- fattn-vec-instance-f16-turbo3_0.cu
|
||||
- fattn-vec-instance-f16-turbo4_0.cu
|
||||
|
||||
and registered them in ggml/src/ggml-cuda/CMakeLists.txt. The companion
|
||||
dispatch cases FATTN_VEC_CASES_ALL_D(GGML_TYPE_F16, GGML_TYPE_TURBO{2,3,4}_0)
|
||||
were added to ggml/src/ggml-cuda/fattn.cu, which is shared with the HIP
|
||||
build path via hipify.
|
||||
|
||||
However, ggml/src/ggml-hip/CMakeLists.txt carries its own explicit list of
|
||||
template instance sources (used when GGML_CUDA_FA_ALL_QUANTS is OFF, which
|
||||
is the default) and was never updated for the new F16-K + TURBO-V combos.
|
||||
The HIP build therefore compiles the dispatch cases (which reference
|
||||
ggml_cuda_flash_attn_ext_vec_case<D, F16, TURBO*>) without ever compiling
|
||||
the matching template instantiations, causing a link-time failure in the
|
||||
-gpu-rocm-hipblas-turboquant CI job.
|
||||
|
||||
Add the three new template instance files to ggml-hip's list so the HIP
|
||||
build links cleanly. Drop this patch once the fork picks up the
|
||||
corresponding upstream sync in ggml-hip/CMakeLists.txt.
|
||||
|
||||
--- a/ggml/src/ggml-hip/CMakeLists.txt
|
||||
+++ b/ggml/src/ggml-hip/CMakeLists.txt
|
||||
@@ -85,14 +85,17 @@ else()
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo3_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo2_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo2_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo2_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-q8_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-q8_0-turbo4_0.cu
|
||||
+ ../ggml-cuda/template-instances/fattn-vec-instance-f16-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo3_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo3_0-turbo4_0.cu
|
||||
../ggml-cuda/template-instances/fattn-vec-instance-turbo4_0-turbo2_0.cu
|
||||
@@ -1,83 +0,0 @@
|
||||
From 660600081fb7b9b769ded5c805a2d39a419f0a0d Mon Sep 17 00:00:00 2001
|
||||
From: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
|
||||
Date: Wed, 8 Apr 2026 11:12:15 -0400
|
||||
Subject: [PATCH] server: respect the ignore eos flag (#21203)
|
||||
|
||||
---
|
||||
tools/server/server-context.cpp | 3 +++
|
||||
tools/server/server-context.h | 3 +++
|
||||
tools/server/server-task.cpp | 3 ++-
|
||||
tools/server/server-task.h | 1 +
|
||||
4 files changed, 9 insertions(+), 1 deletion(-)
|
||||
|
||||
diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
|
||||
index 9d3ac538..b31981c5 100644
|
||||
--- a/tools/server/server-context.cpp
|
||||
+++ b/tools/server/server-context.cpp
|
||||
@@ -3033,6 +3033,8 @@ server_context_meta server_context::get_meta() const {
|
||||
/* fim_rep_token */ llama_vocab_fim_rep(impl->vocab),
|
||||
/* fim_sep_token */ llama_vocab_fim_sep(impl->vocab),
|
||||
|
||||
+ /* logit_bias_eog */ impl->params_base.sampling.logit_bias_eog,
|
||||
+
|
||||
/* model_vocab_type */ llama_vocab_type(impl->vocab),
|
||||
/* model_vocab_n_tokens */ llama_vocab_n_tokens(impl->vocab),
|
||||
/* model_n_ctx_train */ llama_model_n_ctx_train(impl->model),
|
||||
@@ -3117,6 +3119,7 @@ std::unique_ptr<server_res_generator> server_routes::handle_completions_impl(
|
||||
ctx_server.vocab,
|
||||
params,
|
||||
meta->slot_n_ctx,
|
||||
+ meta->logit_bias_eog,
|
||||
data);
|
||||
task.id_slot = json_value(data, "id_slot", -1);
|
||||
|
||||
diff --git a/tools/server/server-context.h b/tools/server/server-context.h
|
||||
index d7ce8735..6ea9afc0 100644
|
||||
--- a/tools/server/server-context.h
|
||||
+++ b/tools/server/server-context.h
|
||||
@@ -39,6 +39,9 @@ struct server_context_meta {
|
||||
llama_token fim_rep_token;
|
||||
llama_token fim_sep_token;
|
||||
|
||||
+ // sampling
|
||||
+ std::vector<llama_logit_bias> logit_bias_eog;
|
||||
+
|
||||
// model meta
|
||||
enum llama_vocab_type model_vocab_type;
|
||||
int32_t model_vocab_n_tokens;
|
||||
diff --git a/tools/server/server-task.cpp b/tools/server/server-task.cpp
|
||||
index 4cc87bc5..856b3f0e 100644
|
||||
--- a/tools/server/server-task.cpp
|
||||
+++ b/tools/server/server-task.cpp
|
||||
@@ -239,6 +239,7 @@ task_params server_task::params_from_json_cmpl(
|
||||
const llama_vocab * vocab,
|
||||
const common_params & params_base,
|
||||
const int n_ctx_slot,
|
||||
+ const std::vector<llama_logit_bias> & logit_bias_eog,
|
||||
const json & data) {
|
||||
task_params params;
|
||||
|
||||
@@ -562,7 +563,7 @@ task_params server_task::params_from_json_cmpl(
|
||||
if (params.sampling.ignore_eos) {
|
||||
params.sampling.logit_bias.insert(
|
||||
params.sampling.logit_bias.end(),
|
||||
- defaults.sampling.logit_bias_eog.begin(), defaults.sampling.logit_bias_eog.end());
|
||||
+ logit_bias_eog.begin(), logit_bias_eog.end());
|
||||
}
|
||||
}
|
||||
|
||||
diff --git a/tools/server/server-task.h b/tools/server/server-task.h
|
||||
index d855bf08..243e47a8 100644
|
||||
--- a/tools/server/server-task.h
|
||||
+++ b/tools/server/server-task.h
|
||||
@@ -209,6 +209,7 @@ struct server_task {
|
||||
const llama_vocab * vocab,
|
||||
const common_params & params_base,
|
||||
const int n_ctx_slot,
|
||||
+ const std::vector<llama_logit_bias> & logit_bias_eog,
|
||||
const json & data);
|
||||
|
||||
// utility function
|
||||
--
|
||||
2.43.0
|
||||
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# stablediffusion.cpp (ggml)
|
||||
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
|
||||
STABLEDIFFUSION_GGML_VERSION?=7d33d4b2ddeafa672761a5880ec33bdff452504d
|
||||
STABLEDIFFUSION_GGML_VERSION?=44cca3d626d301e2215d5e243277e8f0e65bfa78
|
||||
|
||||
CMAKE_ARGS+=-DGGML_MAX_NAME=128
|
||||
|
||||
|
||||
@@ -1106,6 +1106,11 @@ static int ffmpeg_mux_raw_to_mp4(sd_image_t* frames, int num_frames, int fps, co
|
||||
const_cast<char*>("-c:v"), const_cast<char*>("libx264"),
|
||||
const_cast<char*>("-pix_fmt"), const_cast<char*>("yuv420p"),
|
||||
const_cast<char*>("-movflags"), const_cast<char*>("+faststart"),
|
||||
// Force MP4 container. Distributed LocalAI hands us a staging
|
||||
// path (e.g. /staging/localai-output-NNN.tmp) with a non-standard
|
||||
// extension; relying on filename suffix makes ffmpeg bail with
|
||||
// "Unable to choose an output format".
|
||||
const_cast<char*>("-f"), const_cast<char*>("mp4"),
|
||||
const_cast<char*>(dst),
|
||||
nullptr
|
||||
};
|
||||
|
||||
@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
|
||||
|
||||
# whisper.cpp version
|
||||
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
|
||||
WHISPER_CPP_VERSION?=166c20b473d5f4d04052e699f992f625ea2a2fdd
|
||||
WHISPER_CPP_VERSION?=fc674574ca27cac59a15e5b22a09b9d9ad62aafe
|
||||
SO_TARGET?=libgowhisper.so
|
||||
|
||||
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF
|
||||
|
||||
@@ -587,7 +587,6 @@
|
||||
alias: "whisperx"
|
||||
capabilities:
|
||||
nvidia: "cuda12-whisperx"
|
||||
amd: "rocm-whisperx"
|
||||
metal: "metal-whisperx"
|
||||
default: "cpu-whisperx"
|
||||
nvidia-cuda-13: "cuda13-whisperx"
|
||||
@@ -1008,6 +1007,20 @@
|
||||
nvidia-cuda-12: "cuda12-turboquant-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-turboquant-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-turboquant-development"
|
||||
- !!merge <<: *stablediffusionggml
|
||||
name: "stablediffusion-ggml-development"
|
||||
capabilities:
|
||||
default: "cpu-stablediffusion-ggml-development"
|
||||
nvidia: "cuda12-stablediffusion-ggml-development"
|
||||
intel: "intel-sycl-f16-stablediffusion-ggml-development"
|
||||
# amd: "rocm-stablediffusion-ggml-development"
|
||||
vulkan: "vulkan-stablediffusion-ggml-development"
|
||||
nvidia-l4t: "nvidia-l4t-arm64-stablediffusion-ggml-development"
|
||||
metal: "metal-stablediffusion-ggml-development"
|
||||
nvidia-cuda-13: "cuda13-stablediffusion-ggml-development"
|
||||
nvidia-cuda-12: "cuda12-stablediffusion-ggml-development"
|
||||
nvidia-l4t-cuda-12: "nvidia-l4t-arm64-stablediffusion-ggml-development"
|
||||
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-stablediffusion-ggml-development"
|
||||
- !!merge <<: *neutts
|
||||
name: "cpu-neutts"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-neutts"
|
||||
@@ -2731,7 +2744,6 @@
|
||||
name: "whisperx-development"
|
||||
capabilities:
|
||||
nvidia: "cuda12-whisperx-development"
|
||||
amd: "rocm-whisperx-development"
|
||||
metal: "metal-whisperx-development"
|
||||
default: "cpu-whisperx-development"
|
||||
nvidia-cuda-13: "cuda13-whisperx-development"
|
||||
@@ -2757,16 +2769,6 @@
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "rocm-whisperx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "rocm-whisperx-development"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
|
||||
mirrors:
|
||||
- localai/localai-backends:master-gpu-rocm-hipblas-whisperx
|
||||
- !!merge <<: *whisperx
|
||||
name: "cuda13-whisperx"
|
||||
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
# whisperx hard-pins torch~=2.8.0, which is not available in the rocm7.x indexes
|
||||
# (they start at torch 2.10). Keep rocm6.4 wheels here — they still load against
|
||||
# the rocm7.2.1 runtime via AMD's forward-compatibility window.
|
||||
--extra-index-url https://download.pytorch.org/whl/rocm6.4
|
||||
torch==2.8.0+rocm6.4
|
||||
whisperx @ git+https://github.com/m-bain/whisperX.git
|
||||
@@ -341,6 +341,16 @@ impl Backend for KokorosService {
|
||||
Err(Status::unimplemented("Not supported"))
|
||||
}
|
||||
|
||||
type AudioTranscriptionStreamStream =
|
||||
ReceiverStream<Result<backend::TranscriptStreamResponse, Status>>;
|
||||
|
||||
async fn audio_transcription_stream(
|
||||
&self,
|
||||
_: Request<backend::TranscriptRequest>,
|
||||
) -> Result<Response<Self::AudioTranscriptionStreamStream>, Status> {
|
||||
Err(Status::unimplemented("Not supported"))
|
||||
}
|
||||
|
||||
async fn sound_generation(
|
||||
&self,
|
||||
_: Request<backend::SoundGenerationRequest>,
|
||||
|
||||
@@ -242,14 +242,20 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB) (*Distribut
|
||||
DB: authDB,
|
||||
})
|
||||
|
||||
// Create ReplicaReconciler for auto-scaling model replicas
|
||||
// Create ReplicaReconciler for auto-scaling model replicas. Adapter +
|
||||
// RegistrationToken feed the state-reconciliation passes: pending op
|
||||
// drain uses the adapter, and model health probes use the token to auth
|
||||
// against workers' gRPC HealthCheck.
|
||||
reconciler := nodes.NewReplicaReconciler(nodes.ReplicaReconcilerOptions{
|
||||
Registry: registry,
|
||||
Scheduler: router,
|
||||
Unloader: remoteUnloader,
|
||||
DB: authDB,
|
||||
Interval: 30 * time.Second,
|
||||
ScaleDownDelay: 5 * time.Minute,
|
||||
Registry: registry,
|
||||
Scheduler: router,
|
||||
Unloader: remoteUnloader,
|
||||
Adapter: remoteUnloader,
|
||||
RegistrationToken: cfg.Distributed.RegistrationToken,
|
||||
DB: authDB,
|
||||
Interval: 30 * time.Second,
|
||||
ScaleDownDelay: 5 * time.Minute,
|
||||
ProbeStaleAfter: 2 * time.Minute,
|
||||
})
|
||||
|
||||
// Create ModelRouterAdapter to wire into ModelLoader
|
||||
|
||||
@@ -235,7 +235,12 @@ func New(opts ...config.AppOption) (*Application, error) {
|
||||
// In distributed mode, uses PostgreSQL advisory lock so only one frontend
|
||||
// instance runs periodic checks (avoids duplicate upgrades across replicas).
|
||||
if len(options.BackendGalleries) > 0 {
|
||||
uc := NewUpgradeChecker(options, application.ModelLoader(), application.distributedDB())
|
||||
// Pass a lazy getter for the backend manager so the checker always
|
||||
// uses the active one — DistributedBackendManager is swapped in above
|
||||
// and asks workers for their installed backends, which is what
|
||||
// upgrade detection needs in distributed mode.
|
||||
bmFn := func() galleryop.BackendManager { return application.GalleryService().BackendManager() }
|
||||
uc := NewUpgradeChecker(options, application.ModelLoader(), application.distributedDB(), bmFn)
|
||||
application.upgradeChecker = uc
|
||||
go uc.Run(options.Context)
|
||||
}
|
||||
|
||||
@@ -8,6 +8,7 @@ import (
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/services/advisorylock"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/LocalAI/pkg/system"
|
||||
"github.com/mudler/xlog"
|
||||
@@ -26,6 +27,12 @@ type UpgradeChecker struct {
|
||||
galleries []config.Gallery
|
||||
systemState *system.SystemState
|
||||
db *gorm.DB // non-nil in distributed mode
|
||||
// backendManagerFn lazily returns the current backend manager (may be
|
||||
// swapped from Local to Distributed after startup). Pulled through each
|
||||
// check so the UpgradeChecker uses whichever is active. In distributed
|
||||
// mode this ensures CheckUpgrades asks workers instead of the (empty)
|
||||
// frontend filesystem — fixing the bug where upgrades never surfaced.
|
||||
backendManagerFn func() galleryop.BackendManager
|
||||
|
||||
checkInterval time.Duration
|
||||
stop chan struct{}
|
||||
@@ -40,18 +47,22 @@ type UpgradeChecker struct {
|
||||
// NewUpgradeChecker creates a new UpgradeChecker service.
|
||||
// Pass db=nil for standalone mode, or a *gorm.DB for distributed mode
|
||||
// (uses advisory locks so only one instance runs periodic checks).
|
||||
func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB) *UpgradeChecker {
|
||||
// backendManagerFn is optional; when set, CheckUpgrades is routed through
|
||||
// the active backend manager — required in distributed mode so the check
|
||||
// aggregates from workers rather than the empty frontend filesystem.
|
||||
func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoader, db *gorm.DB, backendManagerFn func() galleryop.BackendManager) *UpgradeChecker {
|
||||
return &UpgradeChecker{
|
||||
appConfig: appConfig,
|
||||
modelLoader: ml,
|
||||
galleries: appConfig.BackendGalleries,
|
||||
systemState: appConfig.SystemState,
|
||||
db: db,
|
||||
checkInterval: 6 * time.Hour,
|
||||
stop: make(chan struct{}),
|
||||
done: make(chan struct{}),
|
||||
triggerCh: make(chan struct{}, 1),
|
||||
lastUpgrades: make(map[string]gallery.UpgradeInfo),
|
||||
appConfig: appConfig,
|
||||
modelLoader: ml,
|
||||
galleries: appConfig.BackendGalleries,
|
||||
systemState: appConfig.SystemState,
|
||||
db: db,
|
||||
backendManagerFn: backendManagerFn,
|
||||
checkInterval: 6 * time.Hour,
|
||||
stop: make(chan struct{}),
|
||||
done: make(chan struct{}),
|
||||
triggerCh: make(chan struct{}, 1),
|
||||
lastUpgrades: make(map[string]gallery.UpgradeInfo),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -64,13 +75,16 @@ func NewUpgradeChecker(appConfig *config.ApplicationConfig, ml *model.ModelLoade
|
||||
func (uc *UpgradeChecker) Run(ctx context.Context) {
|
||||
defer close(uc.done)
|
||||
|
||||
// Initial delay: don't slow down startup
|
||||
// Initial delay: don't slow down startup. Short enough that operators
|
||||
// don't stare at an empty upgrade banner for long; long enough that
|
||||
// workers have registered and reported their installed backends.
|
||||
initialDelay := 10 * time.Second
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-uc.stop:
|
||||
return
|
||||
case <-time.After(30 * time.Second):
|
||||
case <-time.After(initialDelay):
|
||||
}
|
||||
|
||||
// First check always runs locally (to warm the cache on this instance)
|
||||
@@ -144,7 +158,18 @@ func (uc *UpgradeChecker) GetAvailableUpgrades() map[string]gallery.UpgradeInfo
|
||||
}
|
||||
|
||||
func (uc *UpgradeChecker) runCheck(ctx context.Context) {
|
||||
upgrades, err := gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState)
|
||||
var (
|
||||
upgrades map[string]gallery.UpgradeInfo
|
||||
err error
|
||||
)
|
||||
if uc.backendManagerFn != nil {
|
||||
if bm := uc.backendManagerFn(); bm != nil {
|
||||
upgrades, err = bm.CheckUpgrades(ctx)
|
||||
}
|
||||
}
|
||||
if upgrades == nil && err == nil {
|
||||
upgrades, err = gallery.CheckBackendUpgrades(ctx, uc.galleries, uc.systemState)
|
||||
}
|
||||
|
||||
uc.mu.Lock()
|
||||
uc.lastCheckTime = time.Now()
|
||||
|
||||
@@ -40,6 +40,12 @@ type TokenUsage struct {
|
||||
ChatDeltas []*proto.ChatDelta // per-chunk deltas from C++ autoparser (only set during streaming)
|
||||
}
|
||||
|
||||
func needsThinkingProbe(c *config.ModelConfig) bool {
|
||||
return c.TemplateConfig.UseTokenizerTemplate &&
|
||||
(c.ReasoningConfig.DisableReasoning == nil ||
|
||||
c.ReasoningConfig.DisableReasoningTagPrefill == nil)
|
||||
}
|
||||
|
||||
// HasChatDeltaContent returns true if any chat delta carries content or reasoning text.
|
||||
// Used to decide whether to prefer C++ autoparser deltas over Go-side tag extraction.
|
||||
func (t TokenUsage) HasChatDeltaContent() bool {
|
||||
@@ -100,11 +106,9 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
|
||||
// tokenizer template path is active) and the multimodal media marker (needed
|
||||
// by custom chat templates so markers line up with what mtmd expects).
|
||||
// We probe whenever any of those slots is still empty.
|
||||
needsThinkingProbe := c.TemplateConfig.UseTokenizerTemplate &&
|
||||
c.ReasoningConfig.DisableReasoning == nil &&
|
||||
c.ReasoningConfig.DisableReasoningTagPrefill == nil
|
||||
shouldProbeThinking := needsThinkingProbe(c)
|
||||
needsMarkerProbe := c.MediaMarker == ""
|
||||
if needsThinkingProbe || needsMarkerProbe {
|
||||
if shouldProbeThinking || needsMarkerProbe {
|
||||
modelOpts := grpcModelOpts(*c, o.SystemState.Model.ModelsPath)
|
||||
config.DetectThinkingSupportFromBackend(ctx, c, inferenceModel, modelOpts)
|
||||
// Update the config in the loader so it persists for future requests
|
||||
|
||||
29
core/backend/llm_probe_test.go
Normal file
29
core/backend/llm_probe_test.go
Normal file
@@ -0,0 +1,29 @@
|
||||
package backend
|
||||
|
||||
import (
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
|
||||
"github.com/gpustack/gguf-parser-go/util/ptr"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("thinking probe gating", func() {
|
||||
It("probes tokenizer-template models when any reasoning default is still unset", func() {
|
||||
cfg := &config.ModelConfig{
|
||||
TemplateConfig: config.TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
Expect(needsThinkingProbe(cfg)).To(BeTrue())
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(true)
|
||||
Expect(needsThinkingProbe(cfg)).To(BeTrue())
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
Expect(needsThinkingProbe(cfg)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("does not probe when tokenizer templates are disabled", func() {
|
||||
cfg := &config.ModelConfig{}
|
||||
Expect(needsThinkingProbe(cfg)).To(BeFalse())
|
||||
})
|
||||
})
|
||||
@@ -507,7 +507,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
app, err := application.New(opts...)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed basic startup tasks with error %s", err.Error())
|
||||
return fmt.Errorf("LocalAI failed to start: %w.\nTroubleshooting steps:\n 1. Check that your models directory exists and is accessible: %s\n 2. Verify model config files are valid YAML: 'local-ai util usecase-heuristic <config>'\n 3. Check available disk space and file permissions\n 4. Run with --log-level=debug for more details\nSee https://localai.io/basics/troubleshooting/ for more help", err, r.ModelsPath)
|
||||
}
|
||||
|
||||
appHTTP, err := http.API(app)
|
||||
|
||||
@@ -3,7 +3,6 @@ package cli
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
@@ -60,7 +59,7 @@ func (t *TranscriptCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
c, exists := cl.GetModelConfig(t.Model)
|
||||
if !exists {
|
||||
return errors.New("model not found")
|
||||
return fmt.Errorf("model %q not found. Run 'local-ai models list' to see available models, or install one with 'local-ai models install <model>'. See https://localai.io/models/ for more information", t.Model)
|
||||
}
|
||||
|
||||
c.Threads = &t.Threads
|
||||
|
||||
@@ -74,7 +74,7 @@ func (u *CreateOCIImageCMD) Run(ctx *cliContext.Context) error {
|
||||
|
||||
func (u *GGUFInfoCMD) Run(ctx *cliContext.Context) error {
|
||||
if len(u.Args) == 0 {
|
||||
return fmt.Errorf("no GGUF file provided")
|
||||
return fmt.Errorf("no GGUF file provided. Usage: local-ai util gguf-info <path-to-file.gguf>\nGGUF is a binary format for storing quantized language models. You can download GGUF models from https://huggingface.co or install one with 'local-ai models install <model>'")
|
||||
}
|
||||
// We try to guess only if we don't have a template defined already
|
||||
f, err := gguf.ParseGGUFFile(u.Args[0])
|
||||
|
||||
@@ -21,6 +21,7 @@ import (
|
||||
"github.com/mudler/LocalAI/core/cli/workerregistry"
|
||||
"github.com/mudler/LocalAI/core/config"
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/core/services/messaging"
|
||||
"github.com/mudler/LocalAI/core/services/nodes"
|
||||
"github.com/mudler/LocalAI/core/services/storage"
|
||||
@@ -597,12 +598,20 @@ func (s *backendSupervisor) installBackend(req messaging.BackendInstallRequest)
|
||||
// Try to find the backend binary
|
||||
backendPath := s.findBackend(req.Backend)
|
||||
if backendPath == "" {
|
||||
// Backend not found locally — try auto-installing from gallery
|
||||
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
|
||||
if err := gallery.InstallBackendFromGallery(
|
||||
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
if req.URI != "" {
|
||||
xlog.Info("Backend not found locally, attempting external install", "backend", req.Backend, "uri", req.URI)
|
||||
if err := galleryop.InstallExternalBackend(
|
||||
context.Background(), galleries, s.systemState, s.ml, nil, req.URI, req.Name, req.Alias,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
}
|
||||
} else {
|
||||
xlog.Info("Backend not found locally, attempting gallery install", "backend", req.Backend)
|
||||
if err := gallery.InstallBackendFromGallery(
|
||||
context.Background(), galleries, s.systemState, s.ml, req.Backend, nil, false,
|
||||
); err != nil {
|
||||
return "", fmt.Errorf("installing backend from gallery: %w", err)
|
||||
}
|
||||
}
|
||||
// Re-register after install and retry
|
||||
gallery.RegisterBackends(s.systemState, s.ml)
|
||||
@@ -738,6 +747,9 @@ func (s *backendSupervisor) subscribeLifecycleEvents() {
|
||||
if b.Metadata != nil {
|
||||
info.InstalledAt = b.Metadata.InstalledAt
|
||||
info.GalleryURL = b.Metadata.GalleryURL
|
||||
info.Version = b.Metadata.Version
|
||||
info.URI = b.Metadata.URI
|
||||
info.Digest = b.Metadata.Digest
|
||||
}
|
||||
infos = append(infos, info)
|
||||
}
|
||||
|
||||
@@ -38,7 +38,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
|
||||
// Check if the token is set
|
||||
// as we always need it.
|
||||
if r.Token == "" {
|
||||
return fmt.Errorf("Token is required")
|
||||
return fmt.Errorf("a P2P token is required to join the network. Set it via the LOCALAI_TOKEN environment variable or the --token flag. You can generate a token by running 'local-ai run --p2p' on the main node. See https://localai.io/features/distribute/ for more information")
|
||||
}
|
||||
|
||||
port, err := freeport.GetFreePort()
|
||||
|
||||
@@ -125,19 +125,7 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
|
||||
return
|
||||
}
|
||||
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
|
||||
|
||||
// Use the rendered template to detect if thinking token is at the end
|
||||
// This reuses the existing DetectThinkingStartToken function
|
||||
if metadata.RenderedTemplate != "" {
|
||||
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
|
||||
thinkingForcedOpen := thinkingStartToken != ""
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
|
||||
} else {
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
|
||||
}
|
||||
applyDetectedThinkingConfig(cfg, metadata)
|
||||
|
||||
// Extract tool format markers from autoparser analysis
|
||||
if tf := metadata.GetToolFormat(); tf != nil && tf.FormatType != "" {
|
||||
@@ -180,3 +168,34 @@ func DetectThinkingSupportFromBackend(ctx context.Context, cfg *ModelConfig, bac
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func applyDetectedThinkingConfig(cfg *ModelConfig, metadata *pb.ModelMetadataResponse) {
|
||||
if cfg == nil || metadata == nil {
|
||||
return
|
||||
}
|
||||
|
||||
// Respect explicit YAML/user config. Backend probing should only fill defaults
|
||||
// when the reasoning mode has not already been set.
|
||||
if cfg.ReasoningConfig.DisableReasoning == nil {
|
||||
cfg.ReasoningConfig.DisableReasoning = ptr.To(!metadata.SupportsThinking)
|
||||
}
|
||||
|
||||
// Respect explicit prefill config for the same reason. Only infer the
|
||||
// default prefill behavior when the user did not set it.
|
||||
if cfg.ReasoningConfig.DisableReasoningTagPrefill == nil {
|
||||
// Use the rendered template to detect if thinking token is at the end.
|
||||
// This reuses the existing DetectThinkingStartToken function.
|
||||
if metadata.RenderedTemplate != "" {
|
||||
thinkingStartToken := reasoning.DetectThinkingStartToken(metadata.RenderedTemplate, &cfg.ReasoningConfig)
|
||||
thinkingForcedOpen := thinkingStartToken != ""
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(!thinkingForcedOpen)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", thinkingForcedOpen, "thinking_start_token", thinkingStartToken)
|
||||
} else {
|
||||
cfg.ReasoningConfig.DisableReasoningTagPrefill = ptr.To(true)
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: thinking support detected", "supports_thinking", metadata.SupportsThinking, "thinking_forced_open", false)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
xlog.Debug("[gguf] DetectThinkingSupportFromBackend: preserving explicit reasoning config", "supports_thinking", metadata.SupportsThinking, "disable_reasoning", *cfg.ReasoningConfig.DisableReasoning, "disable_reasoning_tag_prefill", *cfg.ReasoningConfig.DisableReasoningTagPrefill)
|
||||
}
|
||||
|
||||
101
core/config/gguf_reasoning_test.go
Normal file
101
core/config/gguf_reasoning_test.go
Normal file
@@ -0,0 +1,101 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
|
||||
"github.com/mudler/LocalAI/pkg/reasoning"
|
||||
|
||||
"github.com/gpustack/gguf-parser-go/util/ptr"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("GGUF backend metadata reasoning defaults", func() {
|
||||
It("fills reasoning defaults when unset", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
|
||||
})
|
||||
|
||||
It("preserves fully explicit reasoning settings", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoning: ptr.To(true),
|
||||
DisableReasoningTagPrefill: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
|
||||
It("preserves explicit disable while still inferring missing prefill", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoning: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeFalse())
|
||||
})
|
||||
|
||||
It("preserves explicit prefill while still inferring missing disable flag", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
ReasoningConfig: reasoning.Config{
|
||||
DisableReasoningTagPrefill: ptr.To(true),
|
||||
},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: true,
|
||||
RenderedTemplate: "{{ bos_token }}<think>",
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeFalse())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
|
||||
It("defaults to disabling reasoning when backend does not support thinking", func() {
|
||||
cfg := &ModelConfig{
|
||||
TemplateConfig: TemplateConfig{UseTokenizerTemplate: true},
|
||||
}
|
||||
|
||||
applyDetectedThinkingConfig(cfg, &pb.ModelMetadataResponse{
|
||||
SupportsThinking: false,
|
||||
})
|
||||
|
||||
Expect(cfg.ReasoningConfig.DisableReasoning).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoning).To(BeTrue())
|
||||
Expect(cfg.ReasoningConfig.DisableReasoningTagPrefill).ToNot(BeNil())
|
||||
Expect(*cfg.ReasoningConfig.DisableReasoningTagPrefill).To(BeTrue())
|
||||
})
|
||||
})
|
||||
@@ -193,9 +193,9 @@ func (bcl *ModelConfigLoader) ReadModelConfig(file string, opts ...ConfigLoaderO
|
||||
bcl.configs[c.Name] = *c
|
||||
} else {
|
||||
if err != nil {
|
||||
return fmt.Errorf("config is not valid: %w", err)
|
||||
return fmt.Errorf("model config %q is not valid: %w. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file, err)
|
||||
}
|
||||
return fmt.Errorf("config is not valid")
|
||||
return fmt.Errorf("model config %q is not valid. Ensure the YAML file has a valid 'name' field and correct syntax. See https://localai.io/docs/getting-started/customize-model/ for config reference", file)
|
||||
}
|
||||
|
||||
return nil
|
||||
@@ -373,9 +373,9 @@ func (bcl *ModelConfigLoader) LoadModelConfigsFromPath(path string, opts ...Conf
|
||||
files = append(files, info)
|
||||
}
|
||||
for _, file := range files {
|
||||
// Skip templates, YAML and .keep files
|
||||
if !strings.Contains(file.Name(), ".yaml") && !strings.Contains(file.Name(), ".yml") ||
|
||||
strings.HasPrefix(file.Name(), ".") {
|
||||
// Only load real YAML config files and ignore dotfiles or backup variants
|
||||
ext := strings.ToLower(filepath.Ext(file.Name()))
|
||||
if (ext != ".yaml" && ext != ".yml") || strings.HasPrefix(file.Name(), ".") {
|
||||
continue
|
||||
}
|
||||
|
||||
|
||||
@@ -2,6 +2,7 @@ package config
|
||||
|
||||
import (
|
||||
"os"
|
||||
"path/filepath"
|
||||
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
@@ -109,5 +110,50 @@ options:
|
||||
Expect(testModel.Options).To(ContainElements("foo", "bar", "baz"))
|
||||
|
||||
})
|
||||
|
||||
It("Only loads files ending with yaml or yml", func() {
|
||||
tmpdir, err := os.MkdirTemp("", "model-config-loader")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
defer os.RemoveAll(tmpdir)
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml"), []byte(
|
||||
`name: "foo-model"
|
||||
description: "formal config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak"), []byte(
|
||||
`name: "foo-model"
|
||||
description: "backup config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo-backup.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
err = os.WriteFile(filepath.Join(tmpdir, "foo.yaml.bak.123"), []byte(
|
||||
`name: "foo-backup-only"
|
||||
description: "timestamped backup config"
|
||||
backend: "llama-cpp"
|
||||
parameters:
|
||||
model: "foo-timestamped.gguf"
|
||||
`), 0644)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
bcl := NewModelConfigLoader(tmpdir)
|
||||
err = bcl.LoadModelConfigsFromPath(tmpdir)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
configs := bcl.GetAllModelsConfigs()
|
||||
Expect(configs).To(HaveLen(1))
|
||||
Expect(configs[0].Name).To(Equal("foo-model"))
|
||||
Expect(configs[0].Description).To(Equal("formal config"))
|
||||
|
||||
_, exists := bcl.GetModelConfig("foo-backup-only")
|
||||
Expect(exists).To(BeFalse())
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -110,7 +110,13 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if backends.Exists(name) {
|
||||
// Only short-circuit if the install is *actually usable*. An orphaned
|
||||
// meta entry whose concrete was removed still shows up in
|
||||
// ListSystemBackends with a RunFile pointing at a path that no longer
|
||||
// exists; returning early there leaves the caller with a broken
|
||||
// alias and the worker fails with "backend not found after install
|
||||
// attempt" on every retry. Re-install in that case.
|
||||
if existing, ok := backends.Get(name); ok && isBackendRunnable(existing) {
|
||||
return nil
|
||||
}
|
||||
}
|
||||
@@ -375,17 +381,44 @@ func DeleteBackendFromSystem(systemState *system.SystemState, name string) error
|
||||
}
|
||||
|
||||
if metadata != nil && metadata.MetaBackendFor != "" {
|
||||
metaBackendDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
|
||||
xlog.Debug("Deleting meta backend", "backendDirectory", metaBackendDirectory)
|
||||
if _, err := os.Stat(metaBackendDirectory); os.IsNotExist(err) {
|
||||
return fmt.Errorf("meta backend %q not found", metadata.MetaBackendFor)
|
||||
concreteDirectory := filepath.Join(systemState.Backend.BackendsPath, metadata.MetaBackendFor)
|
||||
xlog.Debug("Deleting concrete backend referenced by meta", "concreteDirectory", concreteDirectory)
|
||||
// If the concrete the meta points to is already gone (earlier delete,
|
||||
// partial install, or manual cleanup), keep going and remove the
|
||||
// orphaned meta dir. Previously we returned an error here, which made
|
||||
// the orphaned meta impossible to uninstall from the UI — the delete
|
||||
// kept failing and every subsequent install short-circuited because
|
||||
// the stale meta metadata made ListSystemBackends.Exists(name) true.
|
||||
if _, statErr := os.Stat(concreteDirectory); statErr == nil {
|
||||
os.RemoveAll(concreteDirectory)
|
||||
} else if os.IsNotExist(statErr) {
|
||||
xlog.Warn("Concrete backend referenced by meta not found — removing orphaned meta only",
|
||||
"meta", name, "concrete", metadata.MetaBackendFor)
|
||||
} else {
|
||||
return statErr
|
||||
}
|
||||
os.RemoveAll(metaBackendDirectory)
|
||||
}
|
||||
|
||||
return os.RemoveAll(backendDirectory)
|
||||
}
|
||||
|
||||
// isBackendRunnable reports whether the given backend entry can actually be
|
||||
// invoked. A meta backend is runnable only if its concrete's run.sh still
|
||||
// exists on disk; concrete backends are considered runnable as long as their
|
||||
// RunFile is set (ListSystemBackends only emits them when the runfile is
|
||||
// present). Used to guard the "already installed" short-circuit so an
|
||||
// orphaned meta pointing at a missing concrete triggers a real reinstall
|
||||
// rather than being silently skipped.
|
||||
func isBackendRunnable(b SystemBackend) bool {
|
||||
if b.RunFile == "" {
|
||||
return false
|
||||
}
|
||||
if fi, err := os.Stat(b.RunFile); err != nil || fi.IsDir() {
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
type SystemBackend struct {
|
||||
Name string
|
||||
RunFile string
|
||||
@@ -394,6 +427,23 @@ type SystemBackend struct {
|
||||
Metadata *BackendMetadata
|
||||
UpgradeAvailable bool `json:"upgrade_available,omitempty"`
|
||||
AvailableVersion string `json:"available_version,omitempty"`
|
||||
// Nodes holds per-node attribution in distributed mode. Empty in single-node.
|
||||
// Each entry describes a node that has this backend installed, with the
|
||||
// version/digest it reports. Lets the UI surface drift and per-node status.
|
||||
Nodes []NodeBackendRef `json:"nodes,omitempty"`
|
||||
}
|
||||
|
||||
// NodeBackendRef describes one node's view of an installed backend. Used both
|
||||
// for per-node attribution in the UI and for drift detection during upgrade
|
||||
// checks (a cluster with mismatched versions/digests is flagged upgradeable).
|
||||
type NodeBackendRef struct {
|
||||
NodeID string `json:"node_id"`
|
||||
NodeName string `json:"node_name"`
|
||||
NodeStatus string `json:"node_status"` // healthy | unhealthy | offline | draining | pending
|
||||
Version string `json:"version,omitempty"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
URI string `json:"uri,omitempty"`
|
||||
InstalledAt string `json:"installed_at,omitempty"`
|
||||
}
|
||||
|
||||
type SystemBackends map[string]SystemBackend
|
||||
|
||||
@@ -952,6 +952,58 @@ var _ = Describe("Gallery Backends", func() {
|
||||
err = DeleteBackendFromSystem(systemState, "non-existent")
|
||||
Expect(err).To(HaveOccurred())
|
||||
})
|
||||
|
||||
It("removes an orphaned meta backend whose concrete is missing", func() {
|
||||
// Real scenario from the dev cluster: the concrete got wiped
|
||||
// (partial install, manual cleanup, previous crash) but the meta
|
||||
// directory + metadata.json still points at it. The old code
|
||||
// errored with "meta backend X not found" and left the orphan in
|
||||
// place, making the backend impossible to uninstall.
|
||||
metaName := "meta-backend"
|
||||
concreteName := "concrete-backend-that-vanished"
|
||||
metaPath := filepath.Join(tempDir, metaName)
|
||||
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
|
||||
|
||||
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
|
||||
data, err := json.MarshalIndent(meta, "", " ")
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
|
||||
|
||||
// Concrete directory intentionally absent.
|
||||
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
|
||||
Expect(DeleteBackendFromSystem(systemState, metaName)).To(Succeed())
|
||||
Expect(metaPath).NotTo(BeADirectory())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("InstallBackendFromGallery — orphaned meta reinstall", func() {
|
||||
It("re-runs install when the meta's concrete is missing", func() {
|
||||
// Seed state: meta dir exists with metadata pointing at a
|
||||
// concrete that was removed from disk. ListSystemBackends still
|
||||
// surfaces the meta via its metadata.Name → the old short-circuit
|
||||
// at `if backends.Exists(name) { return nil }` returned silently,
|
||||
// leaving the worker's findBackend() with a dead alias forever.
|
||||
// The fix: require the backend to be runnable before we skip.
|
||||
metaName := "meta-orphan"
|
||||
concreteName := "concrete-gone"
|
||||
metaPath := filepath.Join(tempDir, metaName)
|
||||
Expect(os.MkdirAll(metaPath, 0750)).To(Succeed())
|
||||
meta := BackendMetadata{Name: metaName, MetaBackendFor: concreteName}
|
||||
data, err := json.MarshalIndent(meta, "", " ")
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(os.WriteFile(filepath.Join(metaPath, "metadata.json"), data, 0644)).To(Succeed())
|
||||
|
||||
systemState, err := system.GetSystemState(system.WithBackendPath(tempDir))
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
|
||||
listed, err := ListSystemBackends(systemState)
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
b, ok := listed.Get(metaName)
|
||||
Expect(ok).To(BeTrue())
|
||||
Expect(isBackendRunnable(b)).To(BeFalse()) // concrete run.sh absent
|
||||
})
|
||||
})
|
||||
|
||||
Describe("ListSystemBackends", func() {
|
||||
|
||||
@@ -23,22 +23,45 @@ type UpgradeInfo struct {
|
||||
AvailableVersion string `json:"available_version"`
|
||||
InstalledDigest string `json:"installed_digest,omitempty"`
|
||||
AvailableDigest string `json:"available_digest,omitempty"`
|
||||
// NodeDrift lists nodes whose installed version or digest differs from
|
||||
// the cluster majority. Non-empty means the cluster has diverged and an
|
||||
// upgrade will realign it. Empty in single-node mode.
|
||||
NodeDrift []NodeDriftInfo `json:"node_drift,omitempty"`
|
||||
}
|
||||
|
||||
// CheckBackendUpgrades compares installed backends against gallery entries
|
||||
// and returns a map of backend names to UpgradeInfo for those that have
|
||||
// newer versions or different OCI digests available.
|
||||
// NodeDriftInfo describes one node that disagrees with the cluster majority
|
||||
// on which version/digest of a backend is installed.
|
||||
type NodeDriftInfo struct {
|
||||
NodeID string `json:"node_id"`
|
||||
NodeName string `json:"node_name"`
|
||||
Version string `json:"version,omitempty"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
}
|
||||
|
||||
// CheckBackendUpgrades is the single-node entrypoint. Distributed callers use
|
||||
// CheckUpgradesAgainst directly with their aggregated SystemBackends.
|
||||
func CheckBackendUpgrades(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState) (map[string]UpgradeInfo, error) {
|
||||
installed, err := ListSystemBackends(systemState)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to list installed backends: %w", err)
|
||||
}
|
||||
return CheckUpgradesAgainst(ctx, galleries, systemState, installed)
|
||||
}
|
||||
|
||||
// CheckUpgradesAgainst compares a caller-supplied SystemBackends set against
|
||||
// the gallery. Fixes the distributed-mode bug where the old code passed the
|
||||
// frontend's (empty) local filesystem through ListSystemBackends and so never
|
||||
// surfaced any upgrades.
|
||||
//
|
||||
// Cluster drift policy: if a backend's per-node versions/digests disagree, the
|
||||
// row is flagged upgradeable regardless of whether any node matches the gallery
|
||||
// — next Upgrade All realigns the cluster. NodeDrift lists the outliers.
|
||||
func CheckUpgradesAgainst(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, installedBackends SystemBackends) (map[string]UpgradeInfo, error) {
|
||||
galleryBackends, err := AvailableBackends(galleries, systemState)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to list available backends: %w", err)
|
||||
}
|
||||
|
||||
installedBackends, err := ListSystemBackends(systemState)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to list installed backends: %w", err)
|
||||
}
|
||||
|
||||
result := make(map[string]UpgradeInfo)
|
||||
|
||||
for _, installed := range installedBackends {
|
||||
@@ -57,34 +80,48 @@ func CheckBackendUpgrades(ctx context.Context, galleries []config.Gallery, syste
|
||||
}
|
||||
|
||||
installedVersion := installed.Metadata.Version
|
||||
installedDigest := installed.Metadata.Digest
|
||||
galleryVersion := galleryEntry.Version
|
||||
|
||||
// If both sides have versions, compare them
|
||||
// Detect cluster drift: does every node report the same version+digest?
|
||||
// In single-node mode this stays empty (Nodes is nil).
|
||||
majority, drift := summarizeNodeDrift(installed.Nodes)
|
||||
if majority.version != "" {
|
||||
installedVersion = majority.version
|
||||
}
|
||||
if majority.digest != "" {
|
||||
installedDigest = majority.digest
|
||||
}
|
||||
|
||||
makeInfo := func(info UpgradeInfo) UpgradeInfo {
|
||||
info.NodeDrift = drift
|
||||
return info
|
||||
}
|
||||
|
||||
// If versions are available on both sides, they're the source of truth.
|
||||
if galleryVersion != "" && installedVersion != "" {
|
||||
if galleryVersion != installedVersion {
|
||||
result[installed.Metadata.Name] = UpgradeInfo{
|
||||
if galleryVersion != installedVersion || len(drift) > 0 {
|
||||
result[installed.Metadata.Name] = makeInfo(UpgradeInfo{
|
||||
BackendName: installed.Metadata.Name,
|
||||
InstalledVersion: installedVersion,
|
||||
AvailableVersion: galleryVersion,
|
||||
}
|
||||
})
|
||||
}
|
||||
// Versions match — no upgrade needed
|
||||
continue
|
||||
}
|
||||
|
||||
// Gallery has a version but installed doesn't — this happens for backends
|
||||
// installed before version tracking was added. Flag as upgradeable so
|
||||
// users can re-install to pick up version metadata.
|
||||
// Gallery has a version but installed doesn't — backends installed before
|
||||
// version tracking was added. Flag as upgradeable to pick up metadata.
|
||||
if galleryVersion != "" && installedVersion == "" {
|
||||
result[installed.Metadata.Name] = UpgradeInfo{
|
||||
result[installed.Metadata.Name] = makeInfo(UpgradeInfo{
|
||||
BackendName: installed.Metadata.Name,
|
||||
InstalledVersion: "",
|
||||
AvailableVersion: galleryVersion,
|
||||
}
|
||||
})
|
||||
continue
|
||||
}
|
||||
|
||||
// Fall back to OCI digest comparison when versions are unavailable
|
||||
// Fall back to OCI digest comparison when versions are unavailable.
|
||||
if downloader.URI(galleryEntry.URI).LooksLikeOCI() {
|
||||
remoteDigest, err := oci.GetImageDigest(galleryEntry.URI, "", nil, nil)
|
||||
if err != nil {
|
||||
@@ -92,21 +129,68 @@ func CheckBackendUpgrades(ctx context.Context, galleries []config.Gallery, syste
|
||||
continue
|
||||
}
|
||||
// If we have a stored digest, compare; otherwise any remote digest
|
||||
// means we can't confirm we're up to date — flag as upgradeable
|
||||
if installed.Metadata.Digest == "" || remoteDigest != installed.Metadata.Digest {
|
||||
result[installed.Metadata.Name] = UpgradeInfo{
|
||||
// means we can't confirm we're up to date — flag as upgradeable.
|
||||
if installedDigest == "" || remoteDigest != installedDigest || len(drift) > 0 {
|
||||
result[installed.Metadata.Name] = makeInfo(UpgradeInfo{
|
||||
BackendName: installed.Metadata.Name,
|
||||
InstalledDigest: installed.Metadata.Digest,
|
||||
InstalledDigest: installedDigest,
|
||||
AvailableDigest: remoteDigest,
|
||||
}
|
||||
})
|
||||
}
|
||||
} else if len(drift) > 0 {
|
||||
// No version/digest path but nodes disagree — still worth flagging.
|
||||
result[installed.Metadata.Name] = makeInfo(UpgradeInfo{
|
||||
BackendName: installed.Metadata.Name,
|
||||
InstalledVersion: installedVersion,
|
||||
InstalledDigest: installedDigest,
|
||||
})
|
||||
}
|
||||
// No version info and non-OCI URI — cannot determine, skip
|
||||
}
|
||||
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// summarizeNodeDrift collapses per-node version/digest tuples to a majority
|
||||
// pair and returns the outliers. In single-node mode (empty nodes slice) this
|
||||
// returns zero values and a nil drift list.
|
||||
func summarizeNodeDrift(nodes []NodeBackendRef) (majority struct{ version, digest string }, drift []NodeDriftInfo) {
|
||||
if len(nodes) == 0 {
|
||||
return majority, nil
|
||||
}
|
||||
|
||||
type key struct{ version, digest string }
|
||||
counts := map[key]int{}
|
||||
var topKey key
|
||||
var topCount int
|
||||
for _, n := range nodes {
|
||||
k := key{n.Version, n.Digest}
|
||||
counts[k]++
|
||||
if counts[k] > topCount {
|
||||
topCount = counts[k]
|
||||
topKey = k
|
||||
}
|
||||
}
|
||||
|
||||
majority.version = topKey.version
|
||||
majority.digest = topKey.digest
|
||||
|
||||
if len(counts) == 1 {
|
||||
return majority, nil // unanimous — no drift
|
||||
}
|
||||
for _, n := range nodes {
|
||||
if n.Version == majority.version && n.Digest == majority.digest {
|
||||
continue
|
||||
}
|
||||
drift = append(drift, NodeDriftInfo{
|
||||
NodeID: n.NodeID,
|
||||
NodeName: n.NodeName,
|
||||
Version: n.Version,
|
||||
Digest: n.Digest,
|
||||
})
|
||||
}
|
||||
return majority, drift
|
||||
}
|
||||
|
||||
// UpgradeBackend upgrades a single backend to the latest gallery version using
|
||||
// an atomic swap with backup-based rollback on failure.
|
||||
func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, galleries []config.Gallery, backendName string, downloadStatus func(string, string, string, float64)) error {
|
||||
|
||||
@@ -144,6 +144,97 @@ var _ = Describe("Upgrade Detection and Execution", func() {
|
||||
})
|
||||
})
|
||||
|
||||
// CheckUpgradesAgainst is the entry point used by DistributedBackendManager.
|
||||
// It takes installed backends directly — typically aggregated from workers —
|
||||
// instead of reading the frontend filesystem. These tests exercise drift
|
||||
// detection, which is the feature the distributed path relies on.
|
||||
Describe("CheckUpgradesAgainst (distributed)", func() {
|
||||
It("flags upgrade when cluster nodes disagree on version, even if gallery matches majority", func() {
|
||||
writeGalleryYAML([]GalleryBackend{
|
||||
{
|
||||
Metadata: Metadata{Name: "my-backend"},
|
||||
URI: filepath.Join(tempDir, "some-source"),
|
||||
Version: "2.0.0",
|
||||
},
|
||||
})
|
||||
|
||||
installed := SystemBackends{
|
||||
"my-backend": SystemBackend{
|
||||
Name: "my-backend",
|
||||
Metadata: &BackendMetadata{Name: "my-backend", Version: "2.0.0"},
|
||||
Nodes: []NodeBackendRef{
|
||||
{NodeID: "a", NodeName: "worker-1", Version: "2.0.0"},
|
||||
{NodeID: "b", NodeName: "worker-2", Version: "2.0.0"},
|
||||
{NodeID: "c", NodeName: "worker-3", Version: "1.0.0"}, // drift
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
upgrades, err := CheckUpgradesAgainst(context.Background(), galleries, systemState, installed)
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(upgrades).To(HaveKey("my-backend"))
|
||||
info := upgrades["my-backend"]
|
||||
Expect(info.AvailableVersion).To(Equal("2.0.0"))
|
||||
Expect(info.NodeDrift).To(HaveLen(1))
|
||||
Expect(info.NodeDrift[0].NodeName).To(Equal("worker-3"))
|
||||
Expect(info.NodeDrift[0].Version).To(Equal("1.0.0"))
|
||||
})
|
||||
|
||||
It("does not flag upgrade when all nodes agree and match gallery", func() {
|
||||
writeGalleryYAML([]GalleryBackend{
|
||||
{
|
||||
Metadata: Metadata{Name: "my-backend"},
|
||||
URI: filepath.Join(tempDir, "some-source"),
|
||||
Version: "2.0.0",
|
||||
},
|
||||
})
|
||||
|
||||
installed := SystemBackends{
|
||||
"my-backend": SystemBackend{
|
||||
Name: "my-backend",
|
||||
Metadata: &BackendMetadata{Name: "my-backend", Version: "2.0.0"},
|
||||
Nodes: []NodeBackendRef{
|
||||
{NodeID: "a", NodeName: "worker-1", Version: "2.0.0"},
|
||||
{NodeID: "b", NodeName: "worker-2", Version: "2.0.0"},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
upgrades, err := CheckUpgradesAgainst(context.Background(), galleries, systemState, installed)
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(upgrades).To(BeEmpty())
|
||||
})
|
||||
|
||||
It("surfaces empty-installed-version path the old distributed code silently missed", func() {
|
||||
// Simulates the real-world bug: worker has a backend, its version
|
||||
// is empty (pre-tracking or OCI-pinned-to-latest), gallery has a
|
||||
// version. Pre-fix CheckUpgrades returned nothing; now it surfaces.
|
||||
writeGalleryYAML([]GalleryBackend{
|
||||
{
|
||||
Metadata: Metadata{Name: "my-backend"},
|
||||
URI: filepath.Join(tempDir, "some-source"),
|
||||
Version: "2.0.0",
|
||||
},
|
||||
})
|
||||
|
||||
installed := SystemBackends{
|
||||
"my-backend": SystemBackend{
|
||||
Name: "my-backend",
|
||||
Metadata: &BackendMetadata{Name: "my-backend"},
|
||||
Nodes: []NodeBackendRef{
|
||||
{NodeID: "a", NodeName: "worker-1"},
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
upgrades, err := CheckUpgradesAgainst(context.Background(), galleries, systemState, installed)
|
||||
Expect(err).NotTo(HaveOccurred())
|
||||
Expect(upgrades).To(HaveKey("my-backend"))
|
||||
Expect(upgrades["my-backend"].InstalledVersion).To(BeEmpty())
|
||||
Expect(upgrades["my-backend"].AvailableVersion).To(Equal("2.0.0"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("UpgradeBackend", func() {
|
||||
It("should replace backend directory and update metadata", func() {
|
||||
// Install v1
|
||||
|
||||
@@ -9,19 +9,26 @@ import (
|
||||
// BackendMonitorEndpoint returns the status of the specified backend
|
||||
// @Summary Backend monitor endpoint
|
||||
// @Tags monitoring
|
||||
// @Param request body schema.BackendMonitorRequest true "Backend statistics request"
|
||||
// @Param model query string true "Name of the model to monitor"
|
||||
// @Success 200 {object} proto.StatusResponse "Response"
|
||||
// @Router /backend/monitor [get]
|
||||
func BackendMonitorEndpoint(bm *monitoring.BackendMonitorService) echo.HandlerFunc {
|
||||
return func(c echo.Context) error {
|
||||
|
||||
input := new(schema.BackendMonitorRequest)
|
||||
// Get input data from the request body
|
||||
if err := c.Bind(input); err != nil {
|
||||
return err
|
||||
model := c.QueryParam("model")
|
||||
// Fall back to binding the request body so pre-existing clients that
|
||||
// sent `{"model": "..."}` with GET keep working.
|
||||
if model == "" {
|
||||
input := new(schema.BackendMonitorRequest)
|
||||
if err := c.Bind(input); err != nil {
|
||||
return err
|
||||
}
|
||||
model = input.Model
|
||||
}
|
||||
if model == "" {
|
||||
return echo.NewHTTPError(400, "model query parameter is required")
|
||||
}
|
||||
|
||||
resp, err := bm.CheckAndSample(input.Model)
|
||||
resp, err := bm.CheckAndSample(model)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -376,7 +376,7 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
|
||||
if err := c.Bind(&req); err != nil || req.Backend == "" {
|
||||
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name required"))
|
||||
}
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries)
|
||||
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, "", "", "")
|
||||
if err != nil {
|
||||
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "error", err)
|
||||
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
|
||||
|
||||
@@ -110,6 +110,27 @@ func UpdateSettingsEndpoint(app *application.Application) echo.HandlerFunc {
|
||||
})
|
||||
}
|
||||
|
||||
// The UI reads ApiKeys from GET /api/settings, which already returns the
|
||||
// merged env+runtime list. When the user clicks Save, the same merged
|
||||
// list comes back in the POST body. Strip the env-supplied keys from
|
||||
// the incoming list before we persist or re-merge, otherwise each save
|
||||
// duplicates the env keys on top of the previous merge (#9071).
|
||||
if settings.ApiKeys != nil {
|
||||
envKeys := startupConfig.ApiKeys
|
||||
envSet := make(map[string]struct{}, len(envKeys))
|
||||
for _, k := range envKeys {
|
||||
envSet[k] = struct{}{}
|
||||
}
|
||||
runtimeOnly := make([]string, 0, len(*settings.ApiKeys))
|
||||
for _, k := range *settings.ApiKeys {
|
||||
if _, fromEnv := envSet[k]; fromEnv {
|
||||
continue
|
||||
}
|
||||
runtimeOnly = append(runtimeOnly, k)
|
||||
}
|
||||
settings.ApiKeys = &runtimeOnly
|
||||
}
|
||||
|
||||
settingsFile := filepath.Join(appConfig.DynamicConfigsDir, "runtime_settings.json")
|
||||
settingsJSON, err := json.MarshalIndent(settings, "", " ")
|
||||
if err != nil {
|
||||
|
||||
@@ -147,6 +147,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
result := ""
|
||||
lastEmittedCount := 0
|
||||
sentInitialRole := false
|
||||
sentReasoning := false
|
||||
hasChatDeltaToolCalls := false
|
||||
hasChatDeltaContent := false
|
||||
|
||||
@@ -190,6 +191,7 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
sentReasoning = true
|
||||
}
|
||||
|
||||
// Stream content deltas (cleaned of reasoning tags) while no tool calls
|
||||
@@ -363,7 +365,12 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
|
||||
}
|
||||
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
|
||||
noActionToRun := len(functionResults) > 0 && functionResults[0].Name == noAction || len(functionResults) == 0
|
||||
// noAction is a sentinel "just answer" pseudo-function — not a real
|
||||
// tool call. Scan the whole slice rather than only index 0 so we
|
||||
// don't drop a real tool call that happens to follow a noAction
|
||||
// entry, and so the default branch isn't entered with only noAction
|
||||
// entries to emit as tool_calls.
|
||||
noActionToRun := !hasRealCall(functionResults, noAction)
|
||||
|
||||
switch {
|
||||
case noActionToRun:
|
||||
@@ -377,108 +384,31 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
|
||||
usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
|
||||
}
|
||||
|
||||
if sentInitialRole {
|
||||
// Content was already streamed during the callback — just emit usage.
|
||||
delta := &schema.Message{}
|
||||
if reasoning != "" && extractor.Reasoning() == "" {
|
||||
delta.Reasoning = &reasoning
|
||||
}
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
}
|
||||
} else {
|
||||
// Content was NOT streamed — send everything at once (fallback).
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
|
||||
result, err := handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
|
||||
if err != nil {
|
||||
xlog.Error("error handling question", "error", err)
|
||||
return err
|
||||
}
|
||||
|
||||
delta := &schema.Message{Content: &result}
|
||||
if reasoning != "" {
|
||||
delta.Reasoning = &reasoning
|
||||
}
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: req.Model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
var result string
|
||||
if !sentInitialRole {
|
||||
var hqErr error
|
||||
result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
|
||||
if hqErr != nil {
|
||||
xlog.Error("error handling question", "error", hqErr)
|
||||
return hqErr
|
||||
}
|
||||
}
|
||||
for _, chunk := range buildNoActionFinalChunks(
|
||||
id, req.Model, created,
|
||||
sentInitialRole, sentReasoning,
|
||||
result, reasoning, usage,
|
||||
) {
|
||||
responses <- chunk
|
||||
}
|
||||
|
||||
default:
|
||||
for i, ss := range functionResults {
|
||||
name, args := ss.Name, ss.Arguments
|
||||
toolCallID := ss.ID
|
||||
if toolCallID == "" {
|
||||
toolCallID = id
|
||||
}
|
||||
|
||||
if i < lastEmittedCount {
|
||||
// Already emitted during streaming by the incremental
|
||||
// JSON/XML parser — skip to avoid duplicate tool calls.
|
||||
continue
|
||||
}
|
||||
|
||||
// Tool call not yet emitted — send name + args (two chunks).
|
||||
initialMessage := schema.OpenAIResponse{
|
||||
ID: id,
|
||||
Created: created,
|
||||
Model: req.Model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{
|
||||
{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Name: name,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
responses <- initialMessage
|
||||
|
||||
responses <- schema.OpenAIResponse{
|
||||
ID: id,
|
||||
Created: created,
|
||||
Model: req.Model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
Content: textContentToReturn,
|
||||
ToolCalls: []schema.ToolCall{
|
||||
{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Arguments: args,
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
}
|
||||
for _, chunk := range buildDeferredToolCallChunks(
|
||||
id, req.Model, created,
|
||||
functionResults, lastEmittedCount,
|
||||
sentInitialRole, *textContentToReturn,
|
||||
sentReasoning, reasoning,
|
||||
) {
|
||||
responses <- chunk
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
233
core/http/endpoints/openai/chat_emit.go
Normal file
233
core/http/endpoints/openai/chat_emit.go
Normal file
@@ -0,0 +1,233 @@
|
||||
package openai
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/functions"
|
||||
)
|
||||
|
||||
// hasRealCall reports whether functionResults contains at least one
|
||||
// entry whose Name is something other than the noAction sentinel.
|
||||
// Used by processTools to decide between the "answer the question"
|
||||
// path and the real tool-call flush.
|
||||
func hasRealCall(functionResults []functions.FuncCallResults, noAction string) bool {
|
||||
for _, fc := range functionResults {
|
||||
if fc.Name != noAction {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// buildNoActionFinalChunks produces the closing SSE chunks for the
|
||||
// noActionToRun branch of processTools (i.e. the model chose the "answer"
|
||||
// pseudo-function or emitted no tool calls at all).
|
||||
//
|
||||
// When content was already streamed (contentAlreadyStreamed=true) the
|
||||
// helper emits a single trailing usage chunk, optionally carrying
|
||||
// reasoning that was produced but not streamed incrementally. When
|
||||
// content was not streamed it emits a role chunk followed by a
|
||||
// content+reasoning+usage chunk — the "send everything at once" fallback.
|
||||
//
|
||||
// Reasoning re-emission is guarded by reasoningAlreadyStreamed, not by
|
||||
// probing the extractor's Go-side state: the C++ autoparser delivers
|
||||
// reasoning through ProcessChatDeltaReasoning which populates a
|
||||
// separate accumulator that extractor.Reasoning() does not expose.
|
||||
// Without this guard the callback would stream reasoning incrementally
|
||||
// and the final chunk would duplicate it.
|
||||
func buildNoActionFinalChunks(
|
||||
id, model string,
|
||||
created int,
|
||||
contentAlreadyStreamed bool,
|
||||
reasoningAlreadyStreamed bool,
|
||||
content string,
|
||||
reasoning string,
|
||||
usage schema.OpenAIUsage,
|
||||
) []schema.OpenAIResponse {
|
||||
var out []schema.OpenAIResponse
|
||||
|
||||
if contentAlreadyStreamed {
|
||||
delta := &schema.Message{}
|
||||
if reasoning != "" && !reasoningAlreadyStreamed {
|
||||
r := reasoning
|
||||
delta.Reasoning = &r
|
||||
}
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
})
|
||||
return out
|
||||
}
|
||||
|
||||
// Content was not streamed — send role, then content (+reasoning) + usage.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Role: "assistant"},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
|
||||
c := content
|
||||
delta := &schema.Message{Content: &c}
|
||||
if reasoning != "" && !reasoningAlreadyStreamed {
|
||||
r := reasoning
|
||||
delta.Reasoning = &r
|
||||
}
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{Delta: delta, Index: 0}},
|
||||
Object: "chat.completion.chunk",
|
||||
Usage: usage,
|
||||
})
|
||||
return out
|
||||
}
|
||||
|
||||
// buildDeferredToolCallChunks produces the SSE chunks for tool calls that
|
||||
// were discovered only during final parsing (i.e. after the streaming
|
||||
// callback finished). The caller forwards every returned chunk to the
|
||||
// responses channel.
|
||||
//
|
||||
// Guarantees:
|
||||
// - tool calls with i < lastEmittedCount are skipped (already streamed)
|
||||
// - each emitted call yields two chunks: name-only, then args-only
|
||||
// - no chunk ever carries both non-empty Content and non-empty ToolCalls
|
||||
// - no chunk ever carries both non-empty Reasoning and non-empty ToolCalls
|
||||
// - if !reasoningAlreadyStreamed && reasoningContent != "",
|
||||
// a reasoning chunk is emitted first
|
||||
// - if !contentAlreadyStreamed && textContent != "",
|
||||
// a role chunk followed by a content chunk is emitted (after reasoning)
|
||||
// - chunks order: [reasoning?] [role+content?] (name, args)+
|
||||
// - fallback IDs for empty ss.ID are unique per index so a client can
|
||||
// match tool_result messages back to the right call
|
||||
func buildDeferredToolCallChunks(
|
||||
id, model string,
|
||||
created int,
|
||||
functionResults []functions.FuncCallResults,
|
||||
lastEmittedCount int,
|
||||
contentAlreadyStreamed bool,
|
||||
textContent string,
|
||||
reasoningAlreadyStreamed bool,
|
||||
reasoningContent string,
|
||||
) []schema.OpenAIResponse {
|
||||
// If every call was already emitted incrementally there's nothing to
|
||||
// flush — and no reason to emit a standalone reasoning/content chunk.
|
||||
hasDeferred := false
|
||||
for i := range functionResults {
|
||||
if i >= lastEmittedCount {
|
||||
hasDeferred = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !hasDeferred {
|
||||
return nil
|
||||
}
|
||||
|
||||
var out []schema.OpenAIResponse
|
||||
|
||||
// Reasoning first — the callback path at processTools emits reasoning
|
||||
// incrementally in its own chunks, but when the C++ autoparser only
|
||||
// surfaces reasoning as a final aggregate the callback never sees it.
|
||||
// Recover it here (no duplication: contentAlreadyStreamed and
|
||||
// reasoningAlreadyStreamed track what the callback already sent).
|
||||
if !reasoningAlreadyStreamed && reasoningContent != "" {
|
||||
r := reasoningContent
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Reasoning: &r},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
// Then content, when it wasn't streamed via the callback. Emit role
|
||||
// and content in separate deltas — the OpenAI streaming contract
|
||||
// forbids bundling content alongside tool_calls in one delta.
|
||||
if !contentAlreadyStreamed && textContent != "" {
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Role: "assistant"},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
c := textContent
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{Content: &c},
|
||||
Index: 0,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
for i, ss := range functionResults {
|
||||
if i < lastEmittedCount {
|
||||
// Already streamed by the incremental JSON/XML parser during
|
||||
// the token callback — skip to avoid a duplicate emission.
|
||||
continue
|
||||
}
|
||||
|
||||
toolCallID := ss.ID
|
||||
if toolCallID == "" {
|
||||
// Unique per-index fallback so multiple empty-ID calls don't
|
||||
// collide on the same request ID (clients match tool results
|
||||
// back by tool_call_id).
|
||||
toolCallID = fmt.Sprintf("%s-%d", id, i)
|
||||
}
|
||||
|
||||
// Name chunk.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Name: ss.Name,
|
||||
},
|
||||
}},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
|
||||
// Args chunk — no Content here. Either it was streamed through
|
||||
// the token callback earlier, or the role+content pair above
|
||||
// already delivered it.
|
||||
out = append(out, schema.OpenAIResponse{
|
||||
ID: id, Created: created, Model: model,
|
||||
Choices: []schema.Choice{{
|
||||
Delta: &schema.Message{
|
||||
Role: "assistant",
|
||||
ToolCalls: []schema.ToolCall{{
|
||||
Index: i,
|
||||
ID: toolCallID,
|
||||
Type: "function",
|
||||
FunctionCall: schema.FunctionCall{
|
||||
Arguments: ss.Arguments,
|
||||
},
|
||||
}},
|
||||
},
|
||||
Index: 0,
|
||||
FinishReason: nil,
|
||||
}},
|
||||
Object: "chat.completion.chunk",
|
||||
})
|
||||
}
|
||||
|
||||
return out
|
||||
}
|
||||
717
core/http/endpoints/openai/chat_emit_test.go
Normal file
717
core/http/endpoints/openai/chat_emit_test.go
Normal file
@@ -0,0 +1,717 @@
|
||||
package openai
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/mudler/LocalAI/core/schema"
|
||||
"github.com/mudler/LocalAI/pkg/functions"
|
||||
. "github.com/onsi/ginkgo/v2"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
// contentOf extracts the string payload from a chunk's delta.Content,
|
||||
// transparently handling both *string and string underlying types so
|
||||
// assertions don't have to care which one the helper produced.
|
||||
func contentOf(ch schema.OpenAIResponse) string {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return ""
|
||||
}
|
||||
switch v := ch.Choices[0].Delta.Content.(type) {
|
||||
case *string:
|
||||
if v == nil {
|
||||
return ""
|
||||
}
|
||||
return *v
|
||||
case string:
|
||||
return v
|
||||
default:
|
||||
return ""
|
||||
}
|
||||
}
|
||||
|
||||
// reasoningOf mirrors contentOf for the delta.Reasoning field, which is a
|
||||
// *string on schema.Message.
|
||||
func reasoningOf(ch schema.OpenAIResponse) string {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return ""
|
||||
}
|
||||
r := ch.Choices[0].Delta.Reasoning
|
||||
if r == nil {
|
||||
return ""
|
||||
}
|
||||
return *r
|
||||
}
|
||||
|
||||
// toolCallsOf returns the ToolCalls slice of a chunk's delta, or nil.
|
||||
func toolCallsOf(ch schema.OpenAIResponse) []schema.ToolCall {
|
||||
if len(ch.Choices) == 0 || ch.Choices[0].Delta == nil {
|
||||
return nil
|
||||
}
|
||||
return ch.Choices[0].Delta.ToolCalls
|
||||
}
|
||||
|
||||
// expectSpecCompliant enforces the invariants on every chunk:
|
||||
// - Object == "chat.completion.chunk"
|
||||
// - Exactly one Choice with Index==0
|
||||
// - No delta ever carries both non-empty Content and non-empty ToolCalls
|
||||
// - No delta ever carries both non-empty Reasoning and non-empty ToolCalls
|
||||
func expectSpecCompliant(chunks []schema.OpenAIResponse) {
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
|
||||
Expect(ch.Choices).To(HaveLen(1), "chunk[%d] Choices length", i)
|
||||
Expect(ch.Choices[0].Index).To(Equal(0), "chunk[%d] Choices[0].Index", i)
|
||||
|
||||
hasContent := contentOf(ch) != ""
|
||||
hasReasoning := reasoningOf(ch) != ""
|
||||
hasToolCalls := len(toolCallsOf(ch)) > 0
|
||||
|
||||
if hasContent && hasToolCalls {
|
||||
Fail(fmt.Sprintf("chunk[%d] violates spec: Content and ToolCalls in same delta", i))
|
||||
}
|
||||
if hasReasoning && hasToolCalls {
|
||||
Fail(fmt.Sprintf("chunk[%d] violates spec: Reasoning and ToolCalls in same delta", i))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// expectMetadata asserts every chunk carries the same id/model/created.
|
||||
func expectMetadata(chunks []schema.OpenAIResponse, id, model string, created int) {
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.ID).To(Equal(id), "chunk[%d] ID", i)
|
||||
Expect(ch.Model).To(Equal(model), "chunk[%d] Model", i)
|
||||
Expect(ch.Created).To(Equal(created), "chunk[%d] Created", i)
|
||||
}
|
||||
}
|
||||
|
||||
var _ = Describe("buildDeferredToolCallChunks", func() {
|
||||
const (
|
||||
testID = "req"
|
||||
testModel = "test-model"
|
||||
testCreated = 1700000000
|
||||
)
|
||||
|
||||
Describe("Case A — primary bug: content already streamed, 1 deferred call", func() {
|
||||
It("emits only the tool_call chunks, no Content anywhere", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: `{"q":"x"}`, ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "Let me search…",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2), "two chunks: name, args")
|
||||
|
||||
// Name chunk
|
||||
tc0 := toolCallsOf(chunks[0])
|
||||
Expect(tc0).To(HaveLen(1))
|
||||
Expect(tc0[0].Index).To(Equal(0))
|
||||
Expect(tc0[0].ID).To(Equal("tc1"))
|
||||
Expect(tc0[0].FunctionCall.Name).To(Equal("search"))
|
||||
Expect(tc0[0].FunctionCall.Arguments).To(BeEmpty())
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// Args chunk — MUST NOT carry Content
|
||||
tc1 := toolCallsOf(chunks[1])
|
||||
Expect(tc1).To(HaveLen(1))
|
||||
Expect(tc1[0].FunctionCall.Name).To(BeEmpty())
|
||||
Expect(tc1[0].FunctionCall.Arguments).To(Equal(`{"q":"x"}`))
|
||||
Expect(contentOf(chunks[1])).To(BeEmpty(),
|
||||
"args chunk must not duplicate already-streamed content")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case B — autoparser / content not streamed", func() {
|
||||
It("emits role, content, then name+args", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "do", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "Here is my plan…",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(4), "role, content, name, args")
|
||||
|
||||
// Role chunk
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// Content chunk
|
||||
Expect(contentOf(chunks[1])).To(Equal("Here is my plan…"))
|
||||
Expect(toolCallsOf(chunks[1])).To(BeEmpty())
|
||||
|
||||
// Name + args chunks
|
||||
Expect(toolCallsOf(chunks[2])).To(HaveLen(1))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("do"))
|
||||
Expect(toolCallsOf(chunks[3])).To(HaveLen(1))
|
||||
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case C — multiple deferred calls, content already streamed", func() {
|
||||
It("emits (name, args) × 3 with no Content anywhere", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tcA"},
|
||||
{Name: "b", Arguments: "{}", ID: "tcB"},
|
||||
{Name: "c", Arguments: "{}", ID: "tcC"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "some narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(6))
|
||||
|
||||
for i := 0; i < 3; i++ {
|
||||
Expect(contentOf(chunks[2*i])).To(BeEmpty(),
|
||||
"call #%d name chunk must not carry Content", i)
|
||||
Expect(contentOf(chunks[2*i+1])).To(BeEmpty(),
|
||||
"call #%d args chunk must not carry Content", i)
|
||||
Expect(toolCallsOf(chunks[2*i])[0].Index).To(Equal(i))
|
||||
Expect(toolCallsOf(chunks[2*i+1])[0].Index).To(Equal(i))
|
||||
}
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Name).To(Equal("b"))
|
||||
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Name).To(Equal("c"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case D — partial incremental emission", func() {
|
||||
It("emits only the deferred tail (call #1), skipping #0", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 1,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("b"))
|
||||
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case E — all calls already emitted incrementally", func() {
|
||||
It("emits nothing", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 2,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case F — content not streamed but textContent empty", func() {
|
||||
It("emits only the tool call chunks, no leading role/content", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: "tcX"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case G — empty ss.ID falls back to a unique per-index ID", func() {
|
||||
It("emits a deterministic per-index fallback", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: ""},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
expectedID := fmt.Sprintf("%s-%d", testID, 0)
|
||||
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal(expectedID))
|
||||
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal(expectedID))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case G2 — multiple empty IDs get distinct fallbacks", func() {
|
||||
It("avoids the collision bug where every empty-ID call shared the request id", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: ""},
|
||||
{Name: "b", Arguments: "{}", ID: ""},
|
||||
{Name: "c", Arguments: "{}", ID: ""},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(6))
|
||||
|
||||
ids := map[string]int{}
|
||||
for _, ch := range chunks {
|
||||
for _, tc := range toolCallsOf(ch) {
|
||||
ids[tc.ID]++
|
||||
}
|
||||
}
|
||||
// Each call yields a name chunk + args chunk → each distinct ID
|
||||
// should appear in exactly two chunks. Three distinct IDs
|
||||
// overall.
|
||||
Expect(ids).To(HaveLen(3), "three distinct per-index fallback IDs")
|
||||
for id, n := range ids {
|
||||
Expect(n).To(Equal(2), "ID %q should appear in exactly 2 chunks", id)
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case H — indices preserved across skip with multiple calls", func() {
|
||||
It("emits Index fields matching functionResults positions", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
{Name: "c", Arguments: "{}", ID: "tc2"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 1,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(4))
|
||||
|
||||
Expect(toolCallsOf(chunks[0])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[1])[0].Index).To(Equal(1))
|
||||
Expect(toolCallsOf(chunks[2])[0].Index).To(Equal(2))
|
||||
Expect(toolCallsOf(chunks[3])[0].Index).To(Equal(2))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case I — explicit non-empty ID is preserved", func() {
|
||||
It("does not touch ss.ID when it's already set", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: "{}", ID: "abc123"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(toolCallsOf(chunks[0])[0].ID).To(Equal("abc123"))
|
||||
Expect(toolCallsOf(chunks[1])[0].ID).To(Equal("abc123"))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case J — chunk-shape sanity", func() {
|
||||
It("splits Name into the first chunk and Arguments into the second", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "x", Arguments: `{"k":"v"}`, ID: "tcX"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "narration",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Name).To(Equal("x"))
|
||||
Expect(toolCallsOf(chunks[0])[0].FunctionCall.Arguments).To(BeEmpty())
|
||||
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Arguments).To(Equal(`{"k":"v"}`))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case K — metadata propagation", func() {
|
||||
It("stamps every chunk with the same id/model/created", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tcA"},
|
||||
{Name: "b", Arguments: "{}", ID: "tcB"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "hello",
|
||||
true, "",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
expectMetadata(chunks, testID, testModel, testCreated)
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case L — Choices[0].Index == 0 invariant", func() {
|
||||
It("is upheld across every branch the helper can take", func() {
|
||||
scenarios := []struct {
|
||||
name string
|
||||
functionResults []functions.FuncCallResults
|
||||
lastEmittedCount int
|
||||
contentStreamed bool
|
||||
text string
|
||||
reasoningStreamed bool
|
||||
reasoning string
|
||||
}{
|
||||
{"streamed-content-deferred-call",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, true, "hi", true, ""},
|
||||
{"unstreamed-content-deferred-call",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, false, "hello", true, ""},
|
||||
{"unstreamed-reasoning-and-content",
|
||||
[]functions.FuncCallResults{{Name: "a", Arguments: "{}"}},
|
||||
0, false, "hello", false, "thinking…"},
|
||||
{"partial-incremental",
|
||||
[]functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}"},
|
||||
{Name: "b", Arguments: "{}"}},
|
||||
1, true, "hi", true, ""},
|
||||
}
|
||||
for _, sc := range scenarios {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
sc.functionResults, sc.lastEmittedCount,
|
||||
sc.contentStreamed, sc.text,
|
||||
sc.reasoningStreamed, sc.reasoning,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.Choices[0].Index).To(Equal(0),
|
||||
"scenario %q chunk[%d] Choices[0].Index", sc.name, i)
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case M — spec compliance across every scenario", func() {
|
||||
It("never mixes Content or Reasoning with ToolCalls in a single delta", func() {
|
||||
scenarios := []struct {
|
||||
name string
|
||||
functionResults []functions.FuncCallResults
|
||||
lastEmittedCount int
|
||||
contentStreamed bool
|
||||
text string
|
||||
reasoningStreamed bool
|
||||
reasoning string
|
||||
}{
|
||||
{"A", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, true, "already-streamed", true, ""},
|
||||
{"C", []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"}},
|
||||
0, true, "already-streamed", true, ""},
|
||||
{"B", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, false, "plan", true, ""},
|
||||
{"Reasoning-deferred", []functions.FuncCallResults{{Name: "a", Arguments: "{}", ID: "tc"}},
|
||||
0, false, "plan", false, "thinking…"},
|
||||
}
|
||||
for _, sc := range scenarios {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
sc.functionResults, sc.lastEmittedCount,
|
||||
sc.contentStreamed, sc.text,
|
||||
sc.reasoningStreamed, sc.reasoning,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
hasContent := contentOf(ch) != ""
|
||||
hasReasoning := reasoningOf(ch) != ""
|
||||
hasToolCalls := len(toolCallsOf(ch)) > 0
|
||||
Expect(hasContent && hasToolCalls).To(BeFalse(),
|
||||
"scenario %q chunk[%d] mixes Content with ToolCalls", sc.name, i)
|
||||
Expect(hasReasoning && hasToolCalls).To(BeFalse(),
|
||||
"scenario %q chunk[%d] mixes Reasoning with ToolCalls", sc.name, i)
|
||||
}
|
||||
}
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case N — empty functionResults", func() {
|
||||
It("emits nothing, including no leading role/content/reasoning", func() {
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
nil, 0,
|
||||
false, "ignored",
|
||||
false, "ignored",
|
||||
)
|
||||
Expect(chunks).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Case O — content not streamed but all calls already emitted", func() {
|
||||
It("emits nothing, not even a standalone content chunk", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc0"},
|
||||
{Name: "b", Arguments: "{}", ID: "tc1"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 2,
|
||||
false, "narration",
|
||||
false, "thinking…",
|
||||
)
|
||||
Expect(chunks).To(BeEmpty(),
|
||||
"no tool_calls to trigger on, so no leading role/content/reasoning either")
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Reasoning — autoparser delivered reasoning only at end", func() {
|
||||
It("emits a leading reasoning chunk when !reasoningAlreadyStreamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "streamed content",
|
||||
false, "model's private thoughts",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(3), "reasoning, name, args")
|
||||
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("model's private thoughts"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(toolCallsOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
// The following two are the tool_call name + args chunks.
|
||||
Expect(toolCallsOf(chunks[1])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[2])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
|
||||
It("emits reasoning before role+content when neither was streamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
false, "final plan",
|
||||
false, "private thoughts",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(5), "reasoning, role, content, name, args")
|
||||
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("private thoughts"))
|
||||
Expect(chunks[1].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[2])).To(Equal("final plan"))
|
||||
Expect(toolCallsOf(chunks[3])[0].FunctionCall.Name).To(Equal("a"))
|
||||
Expect(toolCallsOf(chunks[4])[0].FunctionCall.Arguments).To(Equal("{}"))
|
||||
})
|
||||
|
||||
It("does not re-emit reasoning that was already streamed", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "a", Arguments: "{}", ID: "tc"},
|
||||
}
|
||||
chunks := buildDeferredToolCallChunks(
|
||||
testID, testModel, testCreated,
|
||||
results, 0,
|
||||
true, "streamed",
|
||||
true, "already-sent reasoning",
|
||||
)
|
||||
|
||||
expectSpecCompliant(chunks)
|
||||
Expect(chunks).To(HaveLen(2), "only name + args; no reasoning re-emission")
|
||||
for _, ch := range chunks {
|
||||
Expect(reasoningOf(ch)).To(BeEmpty())
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("hasRealCall", func() {
|
||||
const noAction = "answer"
|
||||
|
||||
It("returns false for nil and empty slices", func() {
|
||||
Expect(hasRealCall(nil, noAction)).To(BeFalse())
|
||||
Expect(hasRealCall([]functions.FuncCallResults{}, noAction)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("returns false when every entry is the noAction sentinel", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
{Name: noAction, Arguments: `{"message":"hello"}`},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeFalse())
|
||||
})
|
||||
|
||||
It("returns true when only one entry is a real call", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: "{}"},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
|
||||
It("returns true when a real call follows a noAction entry", func() {
|
||||
// This is the regression the follow-up fixes: the old
|
||||
// functionResults[0].Name == noAction check would declare this
|
||||
// noActionToRun and drop the real call entirely.
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
{Name: "search", Arguments: "{}"},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
|
||||
It("returns true when a real call precedes a noAction entry", func() {
|
||||
results := []functions.FuncCallResults{
|
||||
{Name: "search", Arguments: "{}"},
|
||||
{Name: noAction, Arguments: `{"message":"hi"}`},
|
||||
}
|
||||
Expect(hasRealCall(results, noAction)).To(BeTrue())
|
||||
})
|
||||
})
|
||||
|
||||
var _ = Describe("buildNoActionFinalChunks", func() {
|
||||
const (
|
||||
testID = "req"
|
||||
testModel = "test-model"
|
||||
testCreated = 1700000000
|
||||
)
|
||||
usage := schema.OpenAIUsage{PromptTokens: 5, CompletionTokens: 7, TotalTokens: 12}
|
||||
|
||||
Describe("Content streamed — trailing usage chunk", func() {
|
||||
It("emits just one chunk with usage, no content, no reasoning when reasoning was streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, true,
|
||||
"", "already-streamed-reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(reasoningOf(chunks[0])).To(BeEmpty(),
|
||||
"reasoning must not be re-emitted once it was streamed via the callback")
|
||||
})
|
||||
|
||||
It("emits a trailing reasoning delivery when reasoning came only at end", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, false,
|
||||
"", "autoparser final reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(reasoningOf(chunks[0])).To(Equal("autoparser final reasoning"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
Expect(chunks[0].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("omits reasoning when it's empty regardless of streamed flag", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
true, false,
|
||||
"", "", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(1))
|
||||
Expect(reasoningOf(chunks[0])).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Content not streamed — role, then content+usage", func() {
|
||||
It("emits role chunk then content chunk without reasoning when reasoning was streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, true,
|
||||
"the answer", "already-streamed-reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
Expect(contentOf(chunks[0])).To(BeEmpty())
|
||||
|
||||
Expect(contentOf(chunks[1])).To(Equal("the answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(BeEmpty(),
|
||||
"reasoning must not be re-emitted if it was streamed earlier")
|
||||
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("emits role, then content+reasoning when reasoning was not streamed", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"the answer", "autoparser final reasoning", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(chunks[0].Choices[0].Delta.Role).To(Equal("assistant"))
|
||||
|
||||
Expect(contentOf(chunks[1])).To(Equal("the answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(Equal("autoparser final reasoning"))
|
||||
Expect(chunks[1].Usage.TotalTokens).To(Equal(12))
|
||||
})
|
||||
|
||||
It("still emits content even when reasoning is empty", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"just an answer", "", usage,
|
||||
)
|
||||
|
||||
Expect(chunks).To(HaveLen(2))
|
||||
Expect(contentOf(chunks[1])).To(Equal("just an answer"))
|
||||
Expect(reasoningOf(chunks[1])).To(BeEmpty())
|
||||
})
|
||||
})
|
||||
|
||||
Describe("Metadata and shape invariants", func() {
|
||||
It("stamps every chunk with the same id/model/created and object", func() {
|
||||
chunks := buildNoActionFinalChunks(
|
||||
testID, testModel, testCreated,
|
||||
false, false,
|
||||
"hi", "reasoning", usage,
|
||||
)
|
||||
for i, ch := range chunks {
|
||||
Expect(ch.ID).To(Equal(testID), "chunk[%d] ID", i)
|
||||
Expect(ch.Model).To(Equal(testModel), "chunk[%d] Model", i)
|
||||
Expect(ch.Created).To(Equal(testCreated), "chunk[%d] Created", i)
|
||||
Expect(ch.Object).To(Equal("chat.completion.chunk"), "chunk[%d] Object", i)
|
||||
Expect(ch.Choices).To(HaveLen(1))
|
||||
Expect(ch.Choices[0].Index).To(Equal(0))
|
||||
}
|
||||
})
|
||||
})
|
||||
})
|
||||
@@ -3,6 +3,7 @@ package middleware
|
||||
import (
|
||||
"bytes"
|
||||
"io"
|
||||
"mime"
|
||||
"net/http"
|
||||
"slices"
|
||||
"sync"
|
||||
@@ -94,7 +95,8 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
|
||||
|
||||
initializeTracing(app.ApplicationConfig().TracingMaxItems)
|
||||
|
||||
if c.Request().Header.Get("Content-Type") != "application/json" {
|
||||
ct, _, _ := mime.ParseMediaType(c.Request().Header.Get("Content-Type"))
|
||||
if ct != "application/json" {
|
||||
return next(c)
|
||||
}
|
||||
|
||||
|
||||
@@ -1529,6 +1529,401 @@ select.input {
|
||||
background: var(--color-warning-light);
|
||||
color: var(--color-warning);
|
||||
}
|
||||
.badge-accent {
|
||||
background: var(--color-accent-light);
|
||||
color: var(--color-accent);
|
||||
}
|
||||
|
||||
/* Horizontal row of badges used inside table cells — consistent spacing so
|
||||
cells line up regardless of how many badges are present. */
|
||||
.badge-row {
|
||||
display: inline-flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 4px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
/* Vertically stacked cell content (e.g. version + update chip + drift chip).
|
||||
Keeps rows readable at scale without inline style={{...}} everywhere. */
|
||||
.cell-stack {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.cell-mono {
|
||||
font-family: 'JetBrains Mono', ui-monospace, monospace;
|
||||
font-size: var(--text-xs);
|
||||
color: var(--color-text-primary);
|
||||
}
|
||||
|
||||
.cell-muted {
|
||||
color: var(--color-text-muted);
|
||||
font-size: var(--text-xs);
|
||||
}
|
||||
|
||||
.cell-subtle {
|
||||
color: var(--color-text-muted);
|
||||
font-size: var(--text-xs);
|
||||
font-weight: 400;
|
||||
margin-left: 8px;
|
||||
}
|
||||
|
||||
.cell-name {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-xs);
|
||||
font-weight: 500;
|
||||
}
|
||||
.cell-name > i {
|
||||
color: var(--color-accent);
|
||||
font-size: var(--text-xs);
|
||||
}
|
||||
|
||||
.row-actions {
|
||||
display: flex;
|
||||
gap: var(--spacing-xs);
|
||||
justify-content: flex-end;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
/* Softer delete button for dense tables — the destructive confirm dialog
|
||||
already owns the "are you sure" affordance, so the button itself doesn't
|
||||
need to scream. Keeps the delete red readable without dominating rows. */
|
||||
.btn.btn-danger-ghost {
|
||||
background: transparent;
|
||||
color: var(--color-error);
|
||||
border-color: transparent;
|
||||
}
|
||||
.btn.btn-danger-ghost:hover:not(:disabled) {
|
||||
background: var(--color-error-light);
|
||||
color: var(--color-error);
|
||||
border-color: var(--color-error-light);
|
||||
}
|
||||
|
||||
/* Small count pill used inside tabs ("(3) ↑ 2") so update counts are
|
||||
glanceable without extra rows of UI. */
|
||||
.tab-pill {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 3px;
|
||||
margin-left: 6px;
|
||||
padding: 1px 6px;
|
||||
border-radius: var(--radius-full);
|
||||
font-size: var(--text-xs);
|
||||
font-weight: 600;
|
||||
line-height: 1.4;
|
||||
}
|
||||
.tab-pill--warning {
|
||||
background: var(--color-warning-light);
|
||||
color: var(--color-warning);
|
||||
}
|
||||
|
||||
/* Stat cards — uniform-height cluster metrics for the Nodes dashboard.
|
||||
Left accent bar ties the color to the metric's semantic (success/warning/
|
||||
error/primary), icon chip sits top-right, value is left-aligned and
|
||||
prominent so you can scan a row of cards without reading labels. */
|
||||
.stat-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(180px, 1fr));
|
||||
gap: var(--spacing-md);
|
||||
margin-bottom: var(--spacing-xl);
|
||||
}
|
||||
|
||||
.stat-card {
|
||||
position: relative;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: var(--spacing-sm);
|
||||
padding: var(--spacing-md);
|
||||
min-height: 96px;
|
||||
background: var(--color-bg-raised, var(--color-bg-secondary));
|
||||
border: 1px solid var(--color-border-subtle);
|
||||
border-radius: var(--radius-lg);
|
||||
transition: transform var(--duration-fast) var(--ease-default),
|
||||
box-shadow var(--duration-fast) var(--ease-default),
|
||||
border-color var(--duration-fast) var(--ease-default);
|
||||
overflow: hidden;
|
||||
}
|
||||
.stat-card::before {
|
||||
content: '';
|
||||
position: absolute;
|
||||
left: 0; top: 0; bottom: 0;
|
||||
width: 3px;
|
||||
background: var(--stat-accent, var(--color-border-subtle));
|
||||
transition: background var(--duration-fast) var(--ease-default);
|
||||
}
|
||||
.stat-card:hover {
|
||||
transform: translateY(-1px);
|
||||
box-shadow: var(--shadow-sm);
|
||||
border-color: var(--color-border);
|
||||
}
|
||||
|
||||
.stat-card__body {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 6px;
|
||||
min-width: 0;
|
||||
}
|
||||
.stat-card__label {
|
||||
font-size: var(--text-xs);
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.08em;
|
||||
text-transform: uppercase;
|
||||
color: var(--color-text-muted);
|
||||
white-space: normal;
|
||||
line-height: 1.2;
|
||||
}
|
||||
.stat-card__value {
|
||||
font-size: var(--text-2xl);
|
||||
font-weight: 600;
|
||||
font-family: 'JetBrains Mono', ui-monospace, monospace;
|
||||
line-height: 1;
|
||||
color: var(--color-text-primary);
|
||||
word-break: break-word;
|
||||
}
|
||||
.stat-card__icon {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 36px;
|
||||
height: 36px;
|
||||
border-radius: var(--radius-md);
|
||||
background: color-mix(in srgb, var(--stat-accent, var(--color-text-muted)) 12%, transparent);
|
||||
color: var(--stat-accent, var(--color-text-muted));
|
||||
font-size: var(--text-lg);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
/* Subtle "Register a new worker" trigger replacing the broken-text chevron
|
||||
link. Still opens the same hint card — just reads like a button now. */
|
||||
.nodes-add-worker {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-xs);
|
||||
padding: var(--spacing-xs) var(--spacing-sm);
|
||||
background: transparent;
|
||||
border: 1px dashed var(--color-border);
|
||||
border-radius: var(--radius-md);
|
||||
color: var(--color-text-secondary);
|
||||
font-size: var(--text-sm);
|
||||
font-family: inherit;
|
||||
font-weight: 500;
|
||||
cursor: pointer;
|
||||
margin-bottom: var(--spacing-md);
|
||||
transition: background var(--duration-fast) var(--ease-default),
|
||||
border-color var(--duration-fast) var(--ease-default),
|
||||
color var(--duration-fast) var(--ease-default);
|
||||
}
|
||||
.nodes-add-worker:hover {
|
||||
background: var(--color-bg-raised, var(--color-bg-secondary));
|
||||
border-color: var(--color-border-strong);
|
||||
color: var(--color-text-primary);
|
||||
}
|
||||
|
||||
/* Shared FilterBar layout — search strip + chip row + toggle strip. Lives
|
||||
outside the .filter-bar chip row so the padding and wrapping behavior is
|
||||
consistent between the Backends gallery and the System tabs. */
|
||||
.filter-bar-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: var(--spacing-sm);
|
||||
margin-bottom: var(--spacing-md);
|
||||
}
|
||||
.filter-bar-group__search {
|
||||
min-width: 200px;
|
||||
flex: 1;
|
||||
}
|
||||
.filter-bar-group__row {
|
||||
display: flex;
|
||||
gap: var(--spacing-md);
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
.filter-bar-group__right {
|
||||
display: flex;
|
||||
gap: var(--spacing-md);
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
padding-left: var(--spacing-md);
|
||||
border-left: 1px solid var(--color-border-subtle);
|
||||
}
|
||||
.filter-bar-group__toggle {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-xs);
|
||||
font-size: var(--text-xs);
|
||||
color: var(--color-text-secondary);
|
||||
cursor: pointer;
|
||||
user-select: none;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.filter-btn__count {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
margin-left: 6px;
|
||||
min-width: 18px;
|
||||
padding: 0 5px;
|
||||
background: color-mix(in srgb, currentColor 18%, transparent);
|
||||
border-radius: var(--radius-full);
|
||||
font-size: 0.625rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
/* Popover — floating surface anchored to a trigger element. Uses the .card
|
||||
base so theming is free, adds z-index + fixed-position + scroll cap so it
|
||||
behaves on tables with many rows. Kept deliberately unstyled beyond that
|
||||
— content is expected to provide its own header/body structure. */
|
||||
.popover {
|
||||
position: fixed;
|
||||
z-index: 200;
|
||||
min-width: 260px;
|
||||
max-width: min(420px, 95vw);
|
||||
max-height: min(420px, 70vh);
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
padding: 0; /* sections provide their own padding */
|
||||
overflow: hidden;
|
||||
box-shadow: var(--shadow-lg);
|
||||
animation: popoverIn var(--duration-fast) var(--ease-default);
|
||||
}
|
||||
|
||||
@keyframes popoverIn {
|
||||
from { opacity: 0; transform: translateY(-4px) scale(0.98); }
|
||||
to { opacity: 1; transform: translateY(0) scale(1); }
|
||||
}
|
||||
|
||||
.popover__header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-sm);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
border-bottom: 1px solid var(--color-border-subtle);
|
||||
font-size: var(--text-sm);
|
||||
}
|
||||
|
||||
.popover__scroll {
|
||||
overflow: auto;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
.popover__table {
|
||||
margin: 0;
|
||||
width: 100%;
|
||||
}
|
||||
.popover__table th {
|
||||
position: sticky;
|
||||
top: 0;
|
||||
background: var(--color-bg-raised, var(--color-bg-secondary));
|
||||
z-index: 1;
|
||||
}
|
||||
|
||||
/* Inline-table chip trigger — looks like a badge but is a button (cursor,
|
||||
focus ring inherited from global :focus-visible). */
|
||||
.chip-trigger {
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
font-family: inherit;
|
||||
}
|
||||
.chip-trigger:hover {
|
||||
filter: brightness(1.08);
|
||||
}
|
||||
|
||||
/* Truncate + ellipsize a long cell (e.g. OCI digest) without breaking the
|
||||
table layout. Tooltip preserves the full value. */
|
||||
.cell-truncate {
|
||||
max-width: 160px;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
/* Compact empty-state used inside expanded drawer sections (e.g. "No
|
||||
models loaded on this node"). Dimmer than the page-level .empty-state
|
||||
because it lives inside another container and shouldn't compete with
|
||||
the row's primary content. */
|
||||
.drawer-empty {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-sm);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
background: var(--color-bg-tertiary);
|
||||
border: 1px dashed var(--color-border-subtle);
|
||||
border-radius: var(--radius-md);
|
||||
color: var(--color-text-muted);
|
||||
font-size: var(--text-sm);
|
||||
}
|
||||
.drawer-empty > i {
|
||||
font-size: var(--text-sm);
|
||||
color: var(--color-text-muted);
|
||||
opacity: 0.8;
|
||||
}
|
||||
|
||||
/* Node-status indicator — replaces the tiny bullet with a proper LED-style
|
||||
dot next to a bold status label. Colors are applied inline from statusConfig
|
||||
so one primitive handles healthy/unhealthy/draining/pending in one shape. */
|
||||
.node-status {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
font-size: var(--text-sm);
|
||||
font-weight: 600;
|
||||
}
|
||||
.node-status__dot {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
border-radius: 50%;
|
||||
box-shadow: 0 0 0 3px color-mix(in srgb, currentColor 15%, transparent);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
/* Row-chevron cell — small 20px toggle used in table rows that expand.
|
||||
The row itself is still clickable; the chevron provides the visible
|
||||
affordance users were missing. */
|
||||
.row-chevron {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
width: 20px;
|
||||
height: 20px;
|
||||
font-size: var(--text-xs);
|
||||
color: var(--color-text-muted);
|
||||
transition: transform var(--duration-fast) var(--ease-default);
|
||||
}
|
||||
.row-chevron.is-expanded {
|
||||
transform: rotate(90deg);
|
||||
color: var(--color-text-primary);
|
||||
}
|
||||
|
||||
/* Upgrade banner — the yellow strip operators see when updates are available.
|
||||
Mirrors the gallery so both pages speak the same visual language. */
|
||||
.upgrade-banner {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: var(--spacing-md);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
margin-bottom: var(--spacing-md);
|
||||
background: var(--color-warning-light);
|
||||
border: 1px solid var(--color-warning);
|
||||
border-radius: var(--radius-md);
|
||||
color: var(--color-warning);
|
||||
}
|
||||
.upgrade-banner__text {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-sm);
|
||||
font-weight: 500;
|
||||
font-size: var(--text-sm);
|
||||
}
|
||||
.upgrade-banner__actions {
|
||||
display: inline-flex;
|
||||
gap: var(--spacing-xs);
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
/* Tabs */
|
||||
.tabs {
|
||||
|
||||
87
core/http/react-ui/src/components/FilterBar.jsx
Normal file
87
core/http/react-ui/src/components/FilterBar.jsx
Normal file
@@ -0,0 +1,87 @@
|
||||
import Toggle from './Toggle'
|
||||
|
||||
// FilterBar is the shared search + chip filter + toggles control strip that
|
||||
// the Backends gallery pioneered. Pulled into its own component so the System
|
||||
// page's two tabs stop looking like a different app — matching visual
|
||||
// grammar + matching keyboard behavior.
|
||||
//
|
||||
// Props:
|
||||
// search: controlled value for the search input.
|
||||
// onSearchChange: (value) => void; null disables the search input entirely.
|
||||
// searchPlaceholder: placeholder for the search input.
|
||||
// filters: [{ key, label, icon }]; activeFilter is compared by key.
|
||||
// Omit to hide the chip row.
|
||||
// activeFilter: currently-selected filter key (use '' for "all" if
|
||||
// that's the first entry in `filters`).
|
||||
// onFilterChange: (key) => void.
|
||||
// toggles: [{ key, label, icon?, checked, onChange }]; optional
|
||||
// right-side toggle group (e.g. "Show all", "Development").
|
||||
// rightSlot: arbitrary element rendered after the toggles — use for
|
||||
// sort controls or extra buttons.
|
||||
export default function FilterBar({
|
||||
search,
|
||||
onSearchChange,
|
||||
searchPlaceholder = 'Search...',
|
||||
filters,
|
||||
activeFilter,
|
||||
onFilterChange,
|
||||
toggles,
|
||||
rightSlot,
|
||||
}) {
|
||||
const hasFilters = Array.isArray(filters) && filters.length > 0
|
||||
const hasToggles = Array.isArray(toggles) && toggles.length > 0
|
||||
|
||||
return (
|
||||
<div className="filter-bar-group">
|
||||
{onSearchChange && (
|
||||
<div className="search-bar filter-bar-group__search">
|
||||
<i className="fas fa-search search-icon" />
|
||||
<input
|
||||
className="input"
|
||||
placeholder={searchPlaceholder}
|
||||
value={search ?? ''}
|
||||
onChange={e => onSearchChange(e.target.value)}
|
||||
aria-label={searchPlaceholder}
|
||||
/>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{(hasFilters || hasToggles || rightSlot) && (
|
||||
<div className="filter-bar-group__row">
|
||||
{hasFilters && (
|
||||
<div className="filter-bar" role="tablist" aria-label="Filter">
|
||||
{filters.map(f => (
|
||||
<button
|
||||
key={f.key}
|
||||
role="tab"
|
||||
aria-selected={activeFilter === f.key}
|
||||
className={`filter-btn ${activeFilter === f.key ? 'active' : ''}`}
|
||||
onClick={() => onFilterChange(f.key)}
|
||||
>
|
||||
{f.icon && <i className={`fas ${f.icon}`} style={{ marginRight: 4 }} />}
|
||||
{f.label}
|
||||
{typeof f.count === 'number' && (
|
||||
<span className="filter-btn__count">{f.count}</span>
|
||||
)}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{(hasToggles || rightSlot) && (
|
||||
<div className="filter-bar-group__right">
|
||||
{hasToggles && toggles.map(t => (
|
||||
<label key={t.key} className="filter-bar-group__toggle">
|
||||
<Toggle checked={t.checked} onChange={t.onChange} />
|
||||
{t.icon && <i className={`fas ${t.icon}`} />}
|
||||
{t.label}
|
||||
</label>
|
||||
))}
|
||||
{rightSlot}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
168
core/http/react-ui/src/components/NodeDistributionChip.jsx
Normal file
168
core/http/react-ui/src/components/NodeDistributionChip.jsx
Normal file
@@ -0,0 +1,168 @@
|
||||
import { useRef, useState } from 'react'
|
||||
import Popover from './Popover'
|
||||
|
||||
// NodeDistributionChip shows where something is installed/loaded across a
|
||||
// cluster. Used by both Manage → Backends (per-row Nodes column, data =
|
||||
// gallery NodeBackendRef with version/digest) and by the Models tab (data =
|
||||
// LoadedOn with state/status). Supports arbitrary cluster size — small
|
||||
// clusters render node-name chips inline, larger clusters collapse to a
|
||||
// summary chip and reveal the full per-node table in a popover on click.
|
||||
//
|
||||
// Field names are intentionally forgiving: both {node_name, node_status} and
|
||||
// {NodeName, NodeStatus} are supported so the component works whether it's
|
||||
// reading directly off the JSON or off a hydrated class.
|
||||
//
|
||||
// Props:
|
||||
// nodes: array of node refs (see shape below).
|
||||
// compactThreshold: max nodes to render inline before collapsing (default 3).
|
||||
// context: 'backends' (default) shows version/digest; 'models'
|
||||
// shows state.
|
||||
// emptyLabel: what to render when nodes is empty (default "—").
|
||||
export default function NodeDistributionChip({
|
||||
nodes,
|
||||
compactThreshold = 3,
|
||||
context = 'backends',
|
||||
emptyLabel = '—',
|
||||
}) {
|
||||
const triggerRef = useRef(null)
|
||||
const [open, setOpen] = useState(false)
|
||||
|
||||
const list = Array.isArray(nodes) ? nodes : []
|
||||
if (list.length === 0) {
|
||||
return <span className="cell-muted">{emptyLabel}</span>
|
||||
}
|
||||
|
||||
const getName = n => n.node_name ?? n.NodeName ?? ''
|
||||
const getStatus = n => n.node_status ?? n.NodeStatus ?? ''
|
||||
const getState = n => n.state ?? n.State ?? ''
|
||||
const getVersion = n => n.version ?? n.Version ?? ''
|
||||
const getDigest = n => n.digest ?? n.Digest ?? ''
|
||||
|
||||
// Inline mode: render every node as its own chip. Good for small clusters
|
||||
// where seeing the names directly is more useful than a summary.
|
||||
if (list.length <= compactThreshold) {
|
||||
return (
|
||||
<div className="badge-row">
|
||||
{list.map(n => {
|
||||
const status = getStatus(n)
|
||||
const variant = status === 'healthy' ? 'badge-success'
|
||||
: status === 'draining' ? 'badge-info'
|
||||
: 'badge-warning'
|
||||
const title = context === 'models'
|
||||
? `${getName(n)} — ${getState(n)} (${status})`
|
||||
: `${getName(n)} — ${status}${getVersion(n) ? ` · v${getVersion(n)}` : ''}`
|
||||
return (
|
||||
<span key={n.node_id ?? n.NodeID ?? getName(n)} className={`badge ${variant}`} title={title}>
|
||||
<i className="fas fa-server" /> {getName(n)}
|
||||
</span>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// Summary mode for anything bigger. Count unhealthy/offline explicitly so
|
||||
// the chip tells an operator at-a-glance whether to click in. "Drift" for
|
||||
// backends = more than one (version, digest) tuple across healthy nodes.
|
||||
const total = list.length
|
||||
const offline = list.filter(n => {
|
||||
const s = getStatus(n)
|
||||
return s !== 'healthy' && s !== 'draining'
|
||||
}).length
|
||||
const drift = context === 'backends' ? countDrift(list) : 0
|
||||
const severity = offline > 0 || drift > 0 ? 'badge-warning' : 'badge-info'
|
||||
|
||||
return (
|
||||
<>
|
||||
<button
|
||||
ref={triggerRef}
|
||||
type="button"
|
||||
className={`badge ${severity} chip-trigger`}
|
||||
aria-expanded={open}
|
||||
aria-haspopup="dialog"
|
||||
onClick={e => { e.stopPropagation(); setOpen(v => !v) }}
|
||||
>
|
||||
<i className="fas fa-server" />
|
||||
{' '}on {total} node{total === 1 ? '' : 's'}
|
||||
{offline > 0 ? ` · ${offline} offline` : ''}
|
||||
{drift > 0 ? ` · ${drift} drift` : ''}
|
||||
</button>
|
||||
<Popover
|
||||
anchor={triggerRef}
|
||||
open={open}
|
||||
onClose={() => setOpen(false)}
|
||||
ariaLabel={context === 'models' ? 'Model distribution' : 'Backend distribution'}
|
||||
>
|
||||
<div className="popover__header">
|
||||
<strong>Installed on {total} node{total === 1 ? '' : 's'}</strong>
|
||||
{offline > 0 && <span className="badge badge-warning">{offline} offline</span>}
|
||||
{drift > 0 && <span className="badge badge-warning">{drift} drift</span>}
|
||||
</div>
|
||||
<div className="popover__scroll">
|
||||
<table className="table popover__table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Node</th>
|
||||
<th>Status</th>
|
||||
{context === 'models' ? <th>State</th> : <>
|
||||
<th>Version</th>
|
||||
<th>Digest</th>
|
||||
</>}
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{list.map(n => (
|
||||
<tr key={n.node_id ?? n.NodeID ?? getName(n)}>
|
||||
<td className="cell-mono">{getName(n)}</td>
|
||||
<td>
|
||||
<span className={`badge ${getStatus(n) === 'healthy' ? 'badge-success' : 'badge-warning'}`}>
|
||||
{getStatus(n)}
|
||||
</span>
|
||||
</td>
|
||||
{context === 'models' ? (
|
||||
<td className="cell-mono">{getState(n) || '—'}</td>
|
||||
) : (
|
||||
<>
|
||||
<td className="cell-mono">{getVersion(n) ? `v${getVersion(n)}` : '—'}</td>
|
||||
<td className="cell-mono cell-truncate" title={getDigest(n)}>
|
||||
{getDigest(n) ? shortenDigest(getDigest(n)) : '—'}
|
||||
</td>
|
||||
</>
|
||||
)}
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</Popover>
|
||||
</>
|
||||
)
|
||||
}
|
||||
|
||||
// countDrift counts nodes whose (version, digest) disagrees with the cluster
|
||||
// majority. Mirrors the backend summarizeNodeDrift logic so the UI number
|
||||
// matches what CheckUpgradesAgainst emits in UpgradeInfo.NodeDrift.
|
||||
function countDrift(nodes) {
|
||||
if (nodes.length <= 1) return 0
|
||||
const counts = new Map()
|
||||
for (const n of nodes) {
|
||||
const key = `${n.version ?? n.Version ?? ''}|${n.digest ?? n.Digest ?? ''}`
|
||||
counts.set(key, (counts.get(key) || 0) + 1)
|
||||
}
|
||||
if (counts.size === 1) return 0 // unanimous
|
||||
let topKey = ''
|
||||
let topCount = 0
|
||||
for (const [k, v] of counts.entries()) {
|
||||
if (v > topCount) { topKey = k; topCount = v }
|
||||
}
|
||||
return nodes.length - topCount
|
||||
}
|
||||
|
||||
// shortenDigest trims a full OCI digest to the common 12-char form used in
|
||||
// docker/oci tooling. Falls back to the raw value if it doesn't match.
|
||||
function shortenDigest(digest) {
|
||||
const m = /^(sha\d+:)?([a-f0-9]+)$/i.exec(digest)
|
||||
if (!m) return digest
|
||||
const hex = m[2]
|
||||
return (m[1] ?? '') + hex.slice(0, 12)
|
||||
}
|
||||
86
core/http/react-ui/src/components/Popover.jsx
Normal file
86
core/http/react-ui/src/components/Popover.jsx
Normal file
@@ -0,0 +1,86 @@
|
||||
import { useEffect, useRef, useState, useCallback } from 'react'
|
||||
|
||||
// Minimal popover: positions itself below-right of the trigger's bounding box,
|
||||
// flips above when there isn't room below, closes on outside click or Escape,
|
||||
// returns focus to the trigger. Uses the existing .card surface so it picks
|
||||
// up theme/border/shadow automatically — no new theming work.
|
||||
//
|
||||
// Props:
|
||||
// anchor: ref to the trigger DOMElement (required)
|
||||
// open: boolean
|
||||
// onClose: () => void
|
||||
// children: popover body
|
||||
// ariaLabel: accessible label for the dialog
|
||||
export default function Popover({ anchor, open, onClose, children, ariaLabel }) {
|
||||
const popoverRef = useRef(null)
|
||||
const [pos, setPos] = useState({ top: 0, left: 0, flipped: false })
|
||||
|
||||
// Compute position from the anchor's bounding box whenever we open or the
|
||||
// viewport changes. 240px is the minimum width we'll reserve; bigger content
|
||||
// grows naturally.
|
||||
const reposition = useCallback(() => {
|
||||
if (!anchor?.current) return
|
||||
const rect = anchor.current.getBoundingClientRect()
|
||||
const popoverHeight = popoverRef.current?.offsetHeight ?? 0
|
||||
const spaceBelow = window.innerHeight - rect.bottom
|
||||
const flipped = popoverHeight > spaceBelow - 16 && rect.top > popoverHeight
|
||||
const top = flipped ? rect.top - popoverHeight - 8 : rect.bottom + 8
|
||||
// Prefer left-aligned; clamp so we don't go off-screen right.
|
||||
const left = Math.min(rect.left, window.innerWidth - 320)
|
||||
setPos({ top, left: Math.max(8, left), flipped })
|
||||
}, [anchor])
|
||||
|
||||
useEffect(() => {
|
||||
if (!open) return
|
||||
reposition()
|
||||
window.addEventListener('resize', reposition)
|
||||
window.addEventListener('scroll', reposition, true)
|
||||
return () => {
|
||||
window.removeEventListener('resize', reposition)
|
||||
window.removeEventListener('scroll', reposition, true)
|
||||
}
|
||||
}, [open, reposition])
|
||||
|
||||
// Close on outside click or Escape. Mousedown (not click) so the close
|
||||
// happens before a parent handler could re-trigger us.
|
||||
useEffect(() => {
|
||||
if (!open) return
|
||||
const onMouseDown = (e) => {
|
||||
if (popoverRef.current && !popoverRef.current.contains(e.target) && !anchor?.current?.contains(e.target)) {
|
||||
onClose()
|
||||
}
|
||||
}
|
||||
const onKey = (e) => { if (e.key === 'Escape') onClose() }
|
||||
document.addEventListener('mousedown', onMouseDown)
|
||||
document.addEventListener('keydown', onKey)
|
||||
return () => {
|
||||
document.removeEventListener('mousedown', onMouseDown)
|
||||
document.removeEventListener('keydown', onKey)
|
||||
}
|
||||
}, [open, onClose, anchor])
|
||||
|
||||
// Return focus to the trigger when the popover closes — keyboard users
|
||||
// shouldn't have to tab back through the whole page to find their spot.
|
||||
useEffect(() => {
|
||||
if (!open && anchor?.current) {
|
||||
// requestAnimationFrame so the close is painted before focus jumps;
|
||||
// otherwise screen readers announce the trigger mid-transition.
|
||||
const raf = requestAnimationFrame(() => anchor.current?.focus?.())
|
||||
return () => cancelAnimationFrame(raf)
|
||||
}
|
||||
}, [open, anchor])
|
||||
|
||||
if (!open) return null
|
||||
|
||||
return (
|
||||
<div
|
||||
ref={popoverRef}
|
||||
role="dialog"
|
||||
aria-label={ariaLabel}
|
||||
className="popover card"
|
||||
style={{ top: pos.top, left: pos.left }}
|
||||
>
|
||||
{children}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -3,6 +3,8 @@ import { useNavigate, useOutletContext, useSearchParams } from 'react-router-dom
|
||||
import ResourceMonitor from '../components/ResourceMonitor'
|
||||
import ConfirmDialog from '../components/ConfirmDialog'
|
||||
import Toggle from '../components/Toggle'
|
||||
import NodeDistributionChip from '../components/NodeDistributionChip'
|
||||
import FilterBar from '../components/FilterBar'
|
||||
import { useModels } from '../hooks/useModels'
|
||||
import { backendControlApi, modelsApi, backendsApi, systemApi, nodesApi } from '../utils/api'
|
||||
|
||||
@@ -11,6 +13,22 @@ const TABS = [
|
||||
{ key: 'backends', label: 'Backends', icon: 'fa-server' },
|
||||
]
|
||||
|
||||
// formatInstalledAt renders an installed_at timestamp as a short relative/abs
|
||||
// string suitable for dense tables. Returns the raw value if parsing fails so
|
||||
// we never display "Invalid Date".
|
||||
function formatInstalledAt(value) {
|
||||
if (!value) return '—'
|
||||
const d = new Date(value)
|
||||
if (isNaN(d.getTime())) return value
|
||||
const now = Date.now()
|
||||
const diffMin = Math.floor((now - d.getTime()) / 60000)
|
||||
if (diffMin < 1) return 'just now'
|
||||
if (diffMin < 60) return `${diffMin}m ago`
|
||||
if (diffMin < 60 * 24) return `${Math.floor(diffMin / 60)}h ago`
|
||||
if (diffMin < 60 * 24 * 30) return `${Math.floor(diffMin / (60 * 24))}d ago`
|
||||
return d.toISOString().slice(0, 10)
|
||||
}
|
||||
|
||||
export default function Manage() {
|
||||
const { addToast } = useOutletContext()
|
||||
const navigate = useNavigate()
|
||||
@@ -28,6 +46,24 @@ export default function Manage() {
|
||||
const [distributedMode, setDistributedMode] = useState(false)
|
||||
const [togglingModels, setTogglingModels] = useState(new Set())
|
||||
const [pinningModels, setPinningModels] = useState(new Set())
|
||||
// Filter state per tab. Persisted in the URL query so switching tabs
|
||||
// doesn't lose the filter the operator just set.
|
||||
const [modelsSearch, setModelsSearch] = useState(() => searchParams.get('mq') || '')
|
||||
const [modelsFilter, setModelsFilter] = useState(() => searchParams.get('mf') || 'all')
|
||||
const [backendsSearch, setBackendsSearch] = useState(() => searchParams.get('bq') || '')
|
||||
const [backendsFilter, setBackendsFilter] = useState(() => searchParams.get('bf') || 'all')
|
||||
|
||||
// Sync filter state into the URL so deep-links + tab switches survive.
|
||||
useEffect(() => {
|
||||
const p = new URLSearchParams(searchParams)
|
||||
const setOrDelete = (k, v) => { if (v && v !== 'all') p.set(k, v); else p.delete(k) }
|
||||
setOrDelete('mq', modelsSearch)
|
||||
setOrDelete('mf', modelsFilter)
|
||||
setOrDelete('bq', backendsSearch)
|
||||
setOrDelete('bf', backendsFilter)
|
||||
setSearchParams(p, { replace: true })
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [modelsSearch, modelsFilter, backendsSearch, backendsFilter])
|
||||
|
||||
const handleTabChange = (tab) => {
|
||||
setActiveTab(tab)
|
||||
@@ -64,6 +100,35 @@ export default function Manage() {
|
||||
nodesApi.list().then(() => setDistributedMode(true)).catch(() => {})
|
||||
}, [fetchLoadedModels, fetchBackends])
|
||||
|
||||
// Auto-refresh the Models tab every 10s in distributed mode so ghost models
|
||||
// (loaded on a worker but absent from this frontend's in-memory cache)
|
||||
// clear on their own without the user clicking Update.
|
||||
const [lastSyncedAt, setLastSyncedAt] = useState(() => Date.now())
|
||||
const [nowTick, setNowTick] = useState(() => Date.now())
|
||||
useEffect(() => {
|
||||
if (!distributedMode || activeTab !== 'models') return
|
||||
const interval = setInterval(() => {
|
||||
refetchModels()
|
||||
fetchLoadedModels()
|
||||
setLastSyncedAt(Date.now())
|
||||
}, 10000)
|
||||
return () => clearInterval(interval)
|
||||
}, [distributedMode, activeTab, refetchModels, fetchLoadedModels])
|
||||
|
||||
// Drive the "last synced Ns ago" label without over-rendering the table.
|
||||
useEffect(() => {
|
||||
if (!distributedMode) return
|
||||
const interval = setInterval(() => setNowTick(Date.now()), 1000)
|
||||
return () => clearInterval(interval)
|
||||
}, [distributedMode])
|
||||
const lastSyncedAgo = (() => {
|
||||
const s = Math.max(0, Math.floor((nowTick - lastSyncedAt) / 1000))
|
||||
if (s < 5) return 'just now'
|
||||
if (s < 60) return `${s}s ago`
|
||||
const m = Math.floor(s / 60)
|
||||
return `${m}m ago`
|
||||
})()
|
||||
|
||||
// Fetch available backend upgrades
|
||||
useEffect(() => {
|
||||
if (activeTab === 'backends') {
|
||||
@@ -196,6 +261,29 @@ export default function Manage() {
|
||||
}
|
||||
}
|
||||
|
||||
const [upgradingAll, setUpgradingAll] = useState(false)
|
||||
const [showOnlyUpgradable, setShowOnlyUpgradable] = useState(false)
|
||||
const handleUpgradeAll = async () => {
|
||||
const names = Object.keys(upgrades)
|
||||
if (names.length === 0) return
|
||||
setUpgradingAll(true)
|
||||
try {
|
||||
// Serial upgrade — matches the gallery's Upgrade All behavior.
|
||||
// Each backend upgrade is itself a cluster-wide fan-out, so parallel
|
||||
// calls would multiply load on every worker.
|
||||
for (const name of names) {
|
||||
try {
|
||||
await backendsApi.upgrade(name)
|
||||
} catch (err) {
|
||||
addToast(`Upgrade failed for ${name}: ${err.message}`, 'error')
|
||||
}
|
||||
}
|
||||
addToast(`Upgrade started for ${names.length} backend${names.length === 1 ? '' : 's'}`, 'info')
|
||||
} finally {
|
||||
setUpgradingAll(false)
|
||||
}
|
||||
}
|
||||
|
||||
const handleDeleteBackend = (name) => {
|
||||
setConfirmDialog({
|
||||
title: 'Delete Backend',
|
||||
@@ -227,29 +315,74 @@ export default function Manage() {
|
||||
|
||||
{/* Tabs */}
|
||||
<div className="tabs" style={{ marginTop: 'var(--spacing-lg)', marginBottom: 'var(--spacing-md)' }}>
|
||||
{TABS.map(t => (
|
||||
<button
|
||||
key={t.key}
|
||||
className={`tab ${activeTab === t.key ? 'tab-active' : ''}`}
|
||||
onClick={() => handleTabChange(t.key)}
|
||||
>
|
||||
<i className={`fas ${t.icon}`} style={{ marginRight: 6 }} />
|
||||
{t.label}
|
||||
{t.key === 'models' && !modelsLoading && ` (${models.length})`}
|
||||
{t.key === 'backends' && !backendsLoading && ` (${backends.length})`}
|
||||
</button>
|
||||
))}
|
||||
{TABS.map(t => {
|
||||
const upgradeCount = t.key === 'backends' ? Object.keys(upgrades).length : 0
|
||||
return (
|
||||
<button
|
||||
key={t.key}
|
||||
className={`tab ${activeTab === t.key ? 'tab-active' : ''}`}
|
||||
onClick={() => handleTabChange(t.key)}
|
||||
>
|
||||
<i className={`fas ${t.icon}`} style={{ marginRight: 6 }} />
|
||||
{t.label}
|
||||
{t.key === 'models' && !modelsLoading && ` (${models.length})`}
|
||||
{t.key === 'backends' && !backendsLoading && ` (${backends.length})`}
|
||||
{upgradeCount > 0 && (
|
||||
<span className="tab-pill tab-pill--warning" title={`${upgradeCount} update${upgradeCount === 1 ? '' : 's'} available`}>
|
||||
<i className="fas fa-arrow-up" /> {upgradeCount}
|
||||
</span>
|
||||
)}
|
||||
</button>
|
||||
)
|
||||
})}
|
||||
</div>
|
||||
|
||||
{/* Models Tab */}
|
||||
{activeTab === 'models' && (
|
||||
{activeTab === 'models' && (() => {
|
||||
// Computed filters — done here so the result is available both to
|
||||
// the FilterBar counts and to the table body.
|
||||
const MODEL_FILTERS = [
|
||||
{ key: 'all', label: 'All', icon: 'fa-layer-group' },
|
||||
{ key: 'running', label: 'Running', icon: 'fa-circle-play' },
|
||||
{ key: 'idle', label: 'Idle', icon: 'fa-pause' },
|
||||
{ key: 'disabled', label: 'Disabled', icon: 'fa-ban' },
|
||||
{ key: 'pinned', label: 'Pinned', icon: 'fa-thumbtack' },
|
||||
...(distributedMode ? [{ key: 'distributed', label: 'Distributed', icon: 'fa-server' }] : []),
|
||||
]
|
||||
const passesFilter = (m) => {
|
||||
if (modelsFilter === 'running') return !m.disabled && (loadedModelIds.has(m.id) || (m.loaded_on && m.loaded_on.length > 0))
|
||||
if (modelsFilter === 'idle') return !m.disabled && !loadedModelIds.has(m.id) && !(m.loaded_on && m.loaded_on.length > 0)
|
||||
if (modelsFilter === 'disabled') return !!m.disabled
|
||||
if (modelsFilter === 'pinned') return !!m.pinned
|
||||
if (modelsFilter === 'distributed') return Array.isArray(m.loaded_on) && m.loaded_on.length > 0
|
||||
return true
|
||||
}
|
||||
const q = modelsSearch.trim().toLowerCase()
|
||||
const passesSearch = (m) => !q || (m.id || '').toLowerCase().includes(q) || (m.backend || '').toLowerCase().includes(q)
|
||||
const visibleModels = models.filter(m => passesFilter(m) && passesSearch(m))
|
||||
return (
|
||||
<div>
|
||||
<div style={{ display: 'flex', alignItems: 'center', justifyContent: 'flex-end', marginBottom: 'var(--spacing-md)' }}>
|
||||
<button className="btn btn-secondary btn-sm" onClick={handleReload} disabled={reloading}>
|
||||
<i className={`fas ${reloading ? 'fa-spinner fa-spin' : 'fa-rotate'}`} />
|
||||
{reloading ? 'Updating...' : 'Update'}
|
||||
</button>
|
||||
</div>
|
||||
<FilterBar
|
||||
search={modelsSearch}
|
||||
onSearchChange={setModelsSearch}
|
||||
searchPlaceholder="Search models by name or backend..."
|
||||
filters={MODEL_FILTERS}
|
||||
activeFilter={modelsFilter}
|
||||
onFilterChange={setModelsFilter}
|
||||
rightSlot={(
|
||||
<>
|
||||
{distributedMode && (
|
||||
<span className="cell-muted" title="Auto-refreshes every 10s in distributed mode so ghost models clear promptly">
|
||||
<i className="fas fa-rotate" /> Last synced {lastSyncedAgo}
|
||||
</span>
|
||||
)}
|
||||
<button className="btn btn-secondary btn-sm" onClick={handleReload} disabled={reloading}>
|
||||
<i className={`fas ${reloading ? 'fa-spinner fa-spin' : 'fa-rotate'}`} />
|
||||
{reloading ? ' Updating...' : ' Update'}
|
||||
</button>
|
||||
</>
|
||||
)}
|
||||
/>
|
||||
|
||||
{modelsLoading ? (
|
||||
<div className="card" style={{ padding: 'var(--spacing-xl)', textAlign: 'center', color: 'var(--color-text-muted)' }}>
|
||||
@@ -274,6 +407,12 @@ export default function Manage() {
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
) : visibleModels.length === 0 ? (
|
||||
<div className="empty-state">
|
||||
<i className="fas fa-filter" />
|
||||
<p>No models match the current filter.</p>
|
||||
<button className="btn btn-ghost btn-sm" onClick={() => { setModelsSearch(''); setModelsFilter('all') }}>Clear filters</button>
|
||||
</div>
|
||||
) : (
|
||||
<div className="table-container">
|
||||
<table className="table">
|
||||
@@ -288,7 +427,7 @@ export default function Manage() {
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{models.map(model => (
|
||||
{visibleModels.map(model => (
|
||||
<tr key={model.id} style={{ opacity: model.disabled ? 0.55 : 1, transition: 'opacity 0.2s' }}>
|
||||
{/* Enable/Disable toggle */}
|
||||
<td>
|
||||
@@ -329,21 +468,33 @@ export default function Manage() {
|
||||
</div>
|
||||
</div>
|
||||
</td>
|
||||
{/* Status */}
|
||||
{/* Status / Distribution */}
|
||||
<td>
|
||||
{model.disabled ? (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
<i className="fas fa-ban" style={{ fontSize: '6px' }} /> Disabled
|
||||
</span>
|
||||
) : loadedModelIds.has(model.id) ? (
|
||||
<span className="badge badge-success">
|
||||
<i className="fas fa-circle" style={{ fontSize: '6px' }} /> Running
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
<i className="fas fa-circle" style={{ fontSize: '6px' }} /> Idle
|
||||
</span>
|
||||
)}
|
||||
<div className="cell-stack">
|
||||
{model.disabled ? (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
<i className="fas fa-ban" /> Disabled
|
||||
</span>
|
||||
) : model.loaded_on && model.loaded_on.length > 0 ? (
|
||||
// Distributed mode: surface where the model is
|
||||
// actually loaded. Shared chip scales to any cluster
|
||||
// size (inline for <=3, popover for larger).
|
||||
<NodeDistributionChip nodes={model.loaded_on} context="models" />
|
||||
) : loadedModelIds.has(model.id) ? (
|
||||
<span className="badge badge-success">
|
||||
<i className="fas fa-circle" style={{ fontSize: '6px' }} /> Running
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge" style={{ background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)' }}>
|
||||
<i className="fas fa-circle" style={{ fontSize: '6px' }} /> Idle
|
||||
</span>
|
||||
)}
|
||||
{model.source === 'registry-only' && (
|
||||
<span className="badge badge-warning" title="Discovered on a worker but not configured locally. Persist the config to make it permanent.">
|
||||
<i className="fas fa-ghost" /> Adopted
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
{/* Backend */}
|
||||
<td>
|
||||
@@ -394,11 +545,34 @@ export default function Manage() {
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
)
|
||||
})()}
|
||||
|
||||
{/* Backends Tab */}
|
||||
{activeTab === 'backends' && (
|
||||
<div>
|
||||
{/* Upgrade banner — mirrors the gallery so operators can't miss updates */}
|
||||
{!backendsLoading && Object.keys(upgrades).length > 0 && (
|
||||
<div className="upgrade-banner">
|
||||
<div className="upgrade-banner__text">
|
||||
<i className="fas fa-arrow-up" />
|
||||
<span>
|
||||
{Object.keys(upgrades).length} backend{Object.keys(upgrades).length === 1 ? ' has' : 's have'} updates available
|
||||
</span>
|
||||
</div>
|
||||
<div className="upgrade-banner__actions">
|
||||
<button
|
||||
className="btn btn-primary btn-sm"
|
||||
onClick={handleUpgradeAll}
|
||||
disabled={upgradingAll}
|
||||
>
|
||||
<i className={`fas ${upgradingAll ? 'fa-spinner fa-spin' : 'fa-arrow-up'}`} />
|
||||
{upgradingAll ? ' Upgrading...' : ' Upgrade all'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{backendsLoading ? (
|
||||
<div style={{ textAlign: 'center', padding: 'var(--spacing-md)', color: 'var(--color-text-muted)', fontSize: '0.875rem' }}>
|
||||
Loading backends...
|
||||
@@ -419,109 +593,217 @@ export default function Manage() {
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
) : (
|
||||
<div className="table-container">
|
||||
) : (() => {
|
||||
// Count chip badges: show N in the filter buttons so operators can
|
||||
// see at a glance how their chips bucket the list.
|
||||
const upgradableCount = backends.filter(b => upgrades[b.Name]).length
|
||||
const userCount = backends.filter(b => !b.IsSystem).length
|
||||
const systemCount = backends.filter(b => b.IsSystem).length
|
||||
const metaCount = backends.filter(b => b.IsMeta).length
|
||||
const offlineCount = backends.filter(b => {
|
||||
const n = b.Nodes || b.nodes || []
|
||||
return n.some(x => {
|
||||
const s = x.node_status || x.NodeStatus
|
||||
return s && s !== 'healthy' && s !== 'draining'
|
||||
})
|
||||
}).length
|
||||
|
||||
const BACKEND_FILTERS = [
|
||||
{ key: 'all', label: 'All', icon: 'fa-layer-group', count: backends.length },
|
||||
{ key: 'user', label: 'User', icon: 'fa-download', count: userCount },
|
||||
{ key: 'system', label: 'System', icon: 'fa-shield-alt', count: systemCount },
|
||||
{ key: 'meta', label: 'Meta', icon: 'fa-layer-group', count: metaCount },
|
||||
...(upgradableCount > 0 ? [{ key: 'upgradable', label: 'Updates', icon: 'fa-arrow-up', count: upgradableCount }] : []),
|
||||
...(distributedMode && offlineCount > 0 ? [{ key: 'offline', label: 'Offline nodes', icon: 'fa-exclamation-circle', count: offlineCount }] : []),
|
||||
]
|
||||
const q = backendsSearch.trim().toLowerCase()
|
||||
const passesSearch = (b) => !q
|
||||
|| (b.Name || '').toLowerCase().includes(q)
|
||||
|| (b.Metadata?.alias || '').toLowerCase().includes(q)
|
||||
|| (b.Metadata?.meta_backend_for || '').toLowerCase().includes(q)
|
||||
const passesFilter = (b) => {
|
||||
switch (backendsFilter) {
|
||||
case 'user': return !b.IsSystem
|
||||
case 'system': return !!b.IsSystem
|
||||
case 'meta': return !!b.IsMeta
|
||||
case 'upgradable': return !!upgrades[b.Name]
|
||||
case 'offline': {
|
||||
const n = b.Nodes || b.nodes || []
|
||||
return n.some(x => {
|
||||
const s = x.node_status || x.NodeStatus
|
||||
return s && s !== 'healthy' && s !== 'draining'
|
||||
})
|
||||
}
|
||||
default: return true
|
||||
}
|
||||
}
|
||||
// Legacy "showOnlyUpgradable" toggle is now the 'upgradable' chip —
|
||||
// keep backward-compat by mapping it onto the new filter.
|
||||
if (showOnlyUpgradable && backendsFilter !== 'upgradable') {
|
||||
// One-shot reconciliation — the old state becomes the new chip.
|
||||
setBackendsFilter('upgradable')
|
||||
setShowOnlyUpgradable(false)
|
||||
}
|
||||
const visibleBackends = backends.filter(b => passesFilter(b) && passesSearch(b))
|
||||
if (visibleBackends.length === 0) {
|
||||
return (
|
||||
<>
|
||||
<FilterBar
|
||||
search={backendsSearch}
|
||||
onSearchChange={setBackendsSearch}
|
||||
searchPlaceholder="Search backends by name or alias..."
|
||||
filters={BACKEND_FILTERS}
|
||||
activeFilter={backendsFilter}
|
||||
onFilterChange={setBackendsFilter}
|
||||
/>
|
||||
<div className="empty-state">
|
||||
<i className="fas fa-filter" />
|
||||
<p>No backends match the current filter.</p>
|
||||
<button className="btn btn-ghost btn-sm" onClick={() => { setBackendsSearch(''); setBackendsFilter('all') }}>Clear filters</button>
|
||||
</div>
|
||||
</>
|
||||
)
|
||||
}
|
||||
return (
|
||||
<>
|
||||
<FilterBar
|
||||
search={backendsSearch}
|
||||
onSearchChange={setBackendsSearch}
|
||||
searchPlaceholder="Search backends by name or alias..."
|
||||
filters={BACKEND_FILTERS}
|
||||
activeFilter={backendsFilter}
|
||||
onFilterChange={setBackendsFilter}
|
||||
/>
|
||||
<div className="table-container">
|
||||
<table className="table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Name</th>
|
||||
<th>Type</th>
|
||||
<th>Metadata</th>
|
||||
<th>Version</th>
|
||||
{distributedMode && <th>Nodes</th>}
|
||||
<th>Installed</th>
|
||||
<th style={{ textAlign: 'right' }}>Actions</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{backends.map((backend, i) => (
|
||||
{visibleBackends.map((backend, i) => {
|
||||
const upgradeInfo = upgrades[backend.Name]
|
||||
const hasDrift = upgradeInfo?.node_drift?.length > 0
|
||||
const nodes = backend.Nodes || backend.nodes || []
|
||||
return (
|
||||
<tr key={backend.Name || i}>
|
||||
<td>
|
||||
<div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
|
||||
<i className="fas fa-cog" style={{ color: 'var(--color-accent)', fontSize: '0.75rem' }} />
|
||||
<span style={{ fontWeight: 500 }}>{backend.Name}</span>
|
||||
<div className="cell-name">
|
||||
<i className="fas fa-cog" />
|
||||
<span>{backend.Name}</span>
|
||||
{backend.Metadata?.alias && (
|
||||
<span className="cell-subtle">alias: {backend.Metadata.alias}</span>
|
||||
)}
|
||||
{backend.Metadata?.meta_backend_for && (
|
||||
<span className="cell-subtle">for: {backend.Metadata.meta_backend_for}</span>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<div style={{ display: 'flex', gap: '4px', flexWrap: 'wrap' }}>
|
||||
<div className="badge-row">
|
||||
{backend.IsSystem ? (
|
||||
<span className="badge badge-info" style={{ fontSize: '0.625rem' }}>
|
||||
<i className="fas fa-shield-alt" style={{ fontSize: '0.5rem', marginRight: 2 }} />System
|
||||
<span className="badge badge-info">
|
||||
<i className="fas fa-shield-alt" /> System
|
||||
</span>
|
||||
) : (
|
||||
<span className="badge badge-success" style={{ fontSize: '0.625rem' }}>
|
||||
<i className="fas fa-download" style={{ fontSize: '0.5rem', marginRight: 2 }} />User
|
||||
<span className="badge badge-success">
|
||||
<i className="fas fa-download" /> User
|
||||
</span>
|
||||
)}
|
||||
{backend.IsMeta && (
|
||||
<span className="badge" style={{ background: 'var(--color-accent-light)', color: 'var(--color-accent)', fontSize: '0.625rem' }}>
|
||||
<i className="fas fa-layer-group" style={{ fontSize: '0.5rem', marginRight: 2 }} />Meta
|
||||
<span className="badge badge-accent">
|
||||
<i className="fas fa-layer-group" /> Meta
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<div style={{ display: 'flex', flexDirection: 'column', gap: 2, fontSize: '0.75rem', color: 'var(--color-text-secondary)' }}>
|
||||
{backend.Metadata?.alias && (
|
||||
<span>
|
||||
<i className="fas fa-tag" style={{ fontSize: '0.5rem', marginRight: 4 }} />
|
||||
Alias: <span style={{ color: 'var(--color-text-primary)' }}>{backend.Metadata.alias}</span>
|
||||
<div className="cell-stack">
|
||||
{backend.Metadata?.version ? (
|
||||
<span className="cell-mono">v{backend.Metadata.version}</span>
|
||||
) : (
|
||||
<span className="cell-muted">—</span>
|
||||
)}
|
||||
{upgradeInfo && (
|
||||
<span className="badge badge-warning" title={upgradeInfo.available_version ? `Upgrade to v${upgradeInfo.available_version}` : 'Update available'}>
|
||||
<i className="fas fa-arrow-up" />
|
||||
{upgradeInfo.available_version ? ` v${upgradeInfo.available_version}` : ' Update available'}
|
||||
</span>
|
||||
)}
|
||||
{backend.Metadata?.meta_backend_for && (
|
||||
<span>
|
||||
<i className="fas fa-link" style={{ fontSize: '0.5rem', marginRight: 4 }} />
|
||||
For: <span style={{ color: 'var(--color-accent)' }}>{backend.Metadata.meta_backend_for}</span>
|
||||
{hasDrift && (
|
||||
<span
|
||||
className="badge badge-warning"
|
||||
title={`Drift: ${upgradeInfo.node_drift.map(d => `${d.node_name}${d.version ? ' v' + d.version : ''}`).join(', ')}`}
|
||||
>
|
||||
<i className="fas fa-code-branch" />
|
||||
{' '}Drift: {upgradeInfo.node_drift.length} node{upgradeInfo.node_drift.length === 1 ? '' : 's'}
|
||||
</span>
|
||||
)}
|
||||
{backend.Metadata?.version && (
|
||||
<span>
|
||||
<i className="fas fa-code-branch" style={{ fontSize: '0.5rem', marginRight: 4 }} />
|
||||
Version: <span style={{ color: 'var(--color-text-primary)' }}>v{backend.Metadata.version}</span>
|
||||
{upgrades[backend.Name] && (
|
||||
<span style={{ color: '#856404', marginLeft: 4 }}>
|
||||
→ v{upgrades[backend.Name].available_version}
|
||||
</span>
|
||||
)}
|
||||
</span>
|
||||
)}
|
||||
{backend.Metadata?.installed_at && (
|
||||
<span>
|
||||
<i className="fas fa-calendar" style={{ fontSize: '0.5rem', marginRight: 4 }} />
|
||||
{backend.Metadata.installed_at}
|
||||
</span>
|
||||
)}
|
||||
{!backend.Metadata?.alias && !backend.Metadata?.meta_backend_for && !backend.Metadata?.installed_at && '—'}
|
||||
</div>
|
||||
</td>
|
||||
{distributedMode && (
|
||||
<td>
|
||||
<NodeDistributionChip nodes={nodes} context="backends" />
|
||||
</td>
|
||||
)}
|
||||
<td>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }}>
|
||||
{!backend.IsSystem ? (
|
||||
<span className="cell-muted cell-mono">
|
||||
{backend.Metadata?.installed_at ? formatInstalledAt(backend.Metadata.installed_at) : '—'}
|
||||
</span>
|
||||
</td>
|
||||
<td>
|
||||
<div className="row-actions">
|
||||
{backend.IsSystem ? (
|
||||
<span className="badge" title="System backends are managed outside the gallery">
|
||||
<i className="fas fa-lock" /> Protected
|
||||
</span>
|
||||
) : (
|
||||
<>
|
||||
{upgradeInfo ? (
|
||||
<button
|
||||
className="btn btn-primary btn-sm"
|
||||
onClick={() => handleUpgradeBackend(backend.Name)}
|
||||
disabled={reinstallingBackends.has(backend.Name)}
|
||||
>
|
||||
<i className={`fas ${reinstallingBackends.has(backend.Name) ? 'fa-spinner fa-spin' : 'fa-arrow-up'}`} />
|
||||
{' '}Upgrade{upgradeInfo.available_version ? ` to v${upgradeInfo.available_version}` : ''}
|
||||
</button>
|
||||
) : (
|
||||
<button
|
||||
className="btn btn-secondary btn-sm"
|
||||
onClick={() => handleReinstallBackend(backend.Name)}
|
||||
disabled={reinstallingBackends.has(backend.Name)}
|
||||
>
|
||||
<i className={`fas ${reinstallingBackends.has(backend.Name) ? 'fa-spinner fa-spin' : 'fa-rotate'}`} />
|
||||
{' '}Reinstall
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
className={`btn ${upgrades[backend.Name] ? 'btn-primary' : 'btn-secondary'} btn-sm`}
|
||||
onClick={() => upgrades[backend.Name] ? handleUpgradeBackend(backend.Name) : handleReinstallBackend(backend.Name)}
|
||||
disabled={reinstallingBackends.has(backend.Name)}
|
||||
title={upgrades[backend.Name] ? `Upgrade to v${upgrades[backend.Name]?.available_version || 'latest'}` : 'Reinstall'}
|
||||
>
|
||||
<i className={`fas ${reinstallingBackends.has(backend.Name) ? 'fa-spinner fa-spin' : upgrades[backend.Name] ? 'fa-arrow-up' : 'fa-rotate'}`} />
|
||||
</button>
|
||||
<button
|
||||
className="btn btn-danger btn-sm"
|
||||
className="btn btn-danger-ghost btn-sm"
|
||||
onClick={() => handleDeleteBackend(backend.Name)}
|
||||
title="Delete"
|
||||
title="Delete backend (removes from all nodes)"
|
||||
>
|
||||
<i className="fas fa-trash" />
|
||||
</button>
|
||||
</>
|
||||
) : (
|
||||
<span style={{ fontSize: '0.75rem', color: 'var(--color-text-muted)' }}>—</span>
|
||||
)}
|
||||
</div>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
)
|
||||
})}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</>
|
||||
)
|
||||
})()}
|
||||
</div>
|
||||
)}
|
||||
|
||||
|
||||
@@ -51,15 +51,22 @@ const modelStateConfig = {
|
||||
idle: { bg: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)', border: 'var(--color-border-subtle)' },
|
||||
}
|
||||
|
||||
function StatCard({ icon, label, value, color }) {
|
||||
function StatCard({ icon, label, value, color, accentVar }) {
|
||||
// accentVar: optional CSS variable for the left edge + icon chip, e.g.
|
||||
// "--color-success". When unset the card reads neutral — used for simple
|
||||
// counts so they don't compete with the semantic cards for attention.
|
||||
const accent = color || (accentVar ? `var(${accentVar})` : 'var(--color-text-primary)')
|
||||
return (
|
||||
<div className="card" style={{ padding: 'var(--spacing-sm) var(--spacing-md)', flex: '1 1 0', minWidth: 120 }}>
|
||||
<div style={{ display: 'flex', alignItems: 'center', gap: 6, marginBottom: 2 }}>
|
||||
<i className={icon} style={{ color: 'var(--color-text-muted)', fontSize: '0.75rem' }} />
|
||||
<span style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)', fontWeight: 500, textTransform: 'uppercase', letterSpacing: '0.03em' }}>{label}</span>
|
||||
<div
|
||||
className="stat-card"
|
||||
style={accentVar ? { ['--stat-accent']: `var(${accentVar})` } : undefined}
|
||||
>
|
||||
<div className="stat-card__body">
|
||||
<div className="stat-card__label">{label}</div>
|
||||
<div className="stat-card__value" style={{ color: accent }}>{value}</div>
|
||||
</div>
|
||||
<div style={{ fontSize: '1.375rem', fontWeight: 700, fontFamily: 'JetBrains Mono, monospace', color: color || 'var(--color-text-primary)' }}>
|
||||
{value}
|
||||
<div className="stat-card__icon" style={accentVar ? { color: accent } : undefined}>
|
||||
<i className={icon} />
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
@@ -543,45 +550,24 @@ export default function Nodes() {
|
||||
</div>
|
||||
|
||||
{/* Tabs */}
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', marginBottom: 'var(--spacing-lg)', borderBottom: '2px solid var(--color-border)' }}>
|
||||
<div className="tabs" style={{ marginBottom: 'var(--spacing-lg)' }}>
|
||||
<button
|
||||
onClick={() => setActiveTab('backend')}
|
||||
style={{
|
||||
padding: 'var(--spacing-sm) var(--spacing-lg)',
|
||||
border: 'none', cursor: 'pointer', fontWeight: 600, fontSize: '0.875rem',
|
||||
background: 'none',
|
||||
color: activeTab === 'backend' ? 'var(--color-primary)' : 'var(--color-text-muted)',
|
||||
borderBottom: activeTab === 'backend' ? '2px solid var(--color-primary)' : '2px solid transparent',
|
||||
marginBottom: '-2px',
|
||||
}}
|
||||
className={`tab ${activeTab === 'backend' ? 'tab-active' : ''}`}
|
||||
>
|
||||
<i className="fas fa-server" style={{ marginRight: 6 }} />
|
||||
Backend Workers ({backendNodes.length})
|
||||
</button>
|
||||
<button
|
||||
onClick={() => setActiveTab('agent')}
|
||||
style={{
|
||||
padding: 'var(--spacing-sm) var(--spacing-lg)',
|
||||
border: 'none', cursor: 'pointer', fontWeight: 600, fontSize: '0.875rem',
|
||||
background: 'none',
|
||||
color: activeTab === 'agent' ? 'var(--color-primary)' : 'var(--color-text-muted)',
|
||||
borderBottom: activeTab === 'agent' ? '2px solid var(--color-primary)' : '2px solid transparent',
|
||||
marginBottom: '-2px',
|
||||
}}
|
||||
className={`tab ${activeTab === 'agent' ? 'tab-active' : ''}`}
|
||||
>
|
||||
<i className="fas fa-robot" style={{ marginRight: 6 }} />
|
||||
Agent Workers ({agentNodes.length})
|
||||
</button>
|
||||
<button
|
||||
onClick={() => setActiveTab('scheduling')}
|
||||
style={{
|
||||
padding: 'var(--spacing-sm) var(--spacing-lg)',
|
||||
border: 'none', cursor: 'pointer', fontWeight: 600, fontSize: '0.875rem',
|
||||
background: 'none',
|
||||
color: activeTab === 'scheduling' ? 'var(--color-primary)' : 'var(--color-text-muted)',
|
||||
borderBottom: activeTab === 'scheduling' ? '2px solid var(--color-primary)' : '2px solid transparent',
|
||||
marginBottom: '-2px',
|
||||
}}
|
||||
className={`tab ${activeTab === 'scheduling' ? 'tab-active' : ''}`}
|
||||
>
|
||||
<i className="fas fa-calendar-alt" style={{ marginRight: 6 }} />
|
||||
Scheduling ({schedulingConfigs.length})
|
||||
@@ -590,13 +576,17 @@ export default function Nodes() {
|
||||
|
||||
{activeTab !== 'scheduling' && <>
|
||||
{/* Stat cards */}
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-md)', marginBottom: 'var(--spacing-xl)', flexWrap: 'wrap' }}>
|
||||
<StatCard icon={activeTab === 'agent' ? 'fas fa-robot' : 'fas fa-server'} label={`Total ${activeTab === 'agent' ? 'Agent' : 'Backend'} Workers`} value={total} />
|
||||
<StatCard icon="fas fa-check-circle" label="Healthy" value={healthy} color="var(--color-success)" />
|
||||
<StatCard icon="fas fa-exclamation-circle" label="Unhealthy" value={unhealthy} color={unhealthy > 0 ? 'var(--color-error)' : undefined} />
|
||||
<StatCard icon="fas fa-hourglass-half" label="Draining" value={draining} color={draining > 0 ? 'var(--color-warning)' : undefined} />
|
||||
<div className="stat-grid">
|
||||
<StatCard icon={activeTab === 'agent' ? 'fas fa-robot' : 'fas fa-server'}
|
||||
label={`Total ${activeTab === 'agent' ? 'Agent' : 'Backend'} Workers`} value={total} />
|
||||
<StatCard icon="fas fa-check-circle" label="Healthy" value={healthy}
|
||||
accentVar={healthy > 0 ? '--color-success' : undefined} />
|
||||
<StatCard icon="fas fa-exclamation-circle" label="Unhealthy" value={unhealthy}
|
||||
accentVar={unhealthy > 0 ? '--color-error' : undefined} />
|
||||
<StatCard icon="fas fa-hourglass-half" label="Draining" value={draining}
|
||||
accentVar={draining > 0 ? '--color-warning' : undefined} />
|
||||
{pending > 0 && (
|
||||
<StatCard icon="fas fa-clock" label="Pending" value={pending} color="var(--color-warning)" />
|
||||
<StatCard icon="fas fa-clock" label="Pending" value={pending} accentVar="--color-warning" />
|
||||
)}
|
||||
{activeTab === 'backend' && (() => {
|
||||
const clusterTotalVRAM = backendNodes.reduce((sum, n) => sum + (n.total_vram || 0), 0)
|
||||
@@ -614,7 +604,7 @@ export default function Nodes() {
|
||||
)}
|
||||
<StatCard icon="fas fa-cube" label="Models Loaded" value={totalModelsLoaded} />
|
||||
<StatCard icon="fas fa-exchange-alt" label="In-Flight Requests" value={totalInFlight}
|
||||
color={totalInFlight > 0 ? 'var(--color-primary)' : undefined} />
|
||||
accentVar={totalInFlight > 0 ? '--color-primary' : undefined} />
|
||||
</>
|
||||
)
|
||||
})()}
|
||||
@@ -627,15 +617,11 @@ export default function Nodes() {
|
||||
<>
|
||||
<button
|
||||
onClick={() => setShowTips(t => !t)}
|
||||
style={{
|
||||
background: 'none', border: 'none', cursor: 'pointer',
|
||||
color: 'var(--color-primary)', fontSize: '0.8125rem', fontWeight: 500,
|
||||
display: 'flex', alignItems: 'center', gap: 6,
|
||||
padding: 0, marginBottom: 'var(--spacing-md)',
|
||||
}}
|
||||
className="nodes-add-worker"
|
||||
aria-expanded={showTips}
|
||||
>
|
||||
<i className={`fas fa-chevron-${showTips ? 'down' : 'right'}`} style={{ fontSize: '0.625rem' }} />
|
||||
Add more workers
|
||||
<i className={`fas ${showTips ? 'fa-chevron-down' : 'fa-plus'}`} />
|
||||
{showTips ? 'Hide instructions' : 'Register a new worker'}
|
||||
</button>
|
||||
{showTips && <WorkerHintCard addToast={addToast} activeTab={activeTab} hasWorkers />}
|
||||
</>
|
||||
@@ -685,23 +671,28 @@ export default function Nodes() {
|
||||
>
|
||||
<td>
|
||||
<div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)' }}>
|
||||
<i className="fas fa-server" style={{ color: 'var(--color-text-muted)', fontSize: '0.875rem' }} />
|
||||
{canExpand && (
|
||||
<span className={`row-chevron${isExpanded ? ' is-expanded' : ''}`} aria-hidden="true">
|
||||
<i className="fas fa-chevron-right" />
|
||||
</span>
|
||||
)}
|
||||
<i className="fas fa-server" style={{ color: 'var(--color-text-muted)', fontSize: 'var(--text-sm)' }} />
|
||||
<div>
|
||||
<div style={{ fontWeight: 600, fontSize: '0.875rem' }}>{node.name}</div>
|
||||
<div style={{ fontSize: '0.75rem', fontFamily: "'JetBrains Mono', monospace", color: 'var(--color-text-muted)' }}>
|
||||
<div style={{ fontWeight: 600, fontSize: 'var(--text-sm)' }}>{node.name}</div>
|
||||
<div className="cell-mono cell-muted">
|
||||
{node.address}
|
||||
</div>
|
||||
{node.labels && Object.keys(node.labels).length > 0 && (
|
||||
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 3, marginTop: 3 }}>
|
||||
{Object.entries(node.labels).slice(0, 5).map(([k, v]) => (
|
||||
<span key={k} style={{
|
||||
fontSize: '0.625rem', padding: '1px 5px', borderRadius: 3,
|
||||
background: 'var(--color-bg-tertiary)', color: 'var(--color-text-muted)',
|
||||
fontFamily: "'JetBrains Mono', monospace", border: '1px solid var(--color-border-subtle)',
|
||||
<span key={k} className="cell-mono" style={{
|
||||
padding: '1px 5px', borderRadius: 3,
|
||||
background: 'var(--color-bg-tertiary)',
|
||||
border: '1px solid var(--color-border-subtle)',
|
||||
}}>{k}={v}</span>
|
||||
))}
|
||||
{Object.keys(node.labels).length > 5 && (
|
||||
<span style={{ fontSize: '0.625rem', color: 'var(--color-text-muted)' }}>
|
||||
<span className="cell-muted">
|
||||
+{Object.keys(node.labels).length - 5} more
|
||||
</span>
|
||||
)}
|
||||
@@ -711,12 +702,10 @@ export default function Nodes() {
|
||||
</div>
|
||||
</td>
|
||||
<td>
|
||||
<div style={{ display: 'flex', alignItems: 'center', gap: 6 }}>
|
||||
<i className="fas fa-circle" style={{ fontSize: '0.5rem', color: status.color }} />
|
||||
<span style={{ fontSize: '0.8125rem', color: status.color, fontWeight: 500 }}>
|
||||
{status.label}
|
||||
</span>
|
||||
</div>
|
||||
<span className="node-status" style={{ color: status.color }}>
|
||||
<span className="node-status__dot" style={{ background: status.color }} />
|
||||
{status.label}
|
||||
</span>
|
||||
</td>
|
||||
<td>
|
||||
{hasGPU && totalVRAMStr ? (
|
||||
@@ -745,38 +734,37 @@ export default function Nodes() {
|
||||
</span>
|
||||
</td>
|
||||
<td style={{ textAlign: 'right' }}>
|
||||
<div style={{ display: 'flex', gap: 'var(--spacing-xs)', justifyContent: 'flex-end' }} onClick={e => e.stopPropagation()}>
|
||||
<div className="row-actions" onClick={e => e.stopPropagation()}>
|
||||
{node.status === 'pending' && (
|
||||
<button
|
||||
className="btn btn-primary btn-sm"
|
||||
onClick={() => handleApprove(node.id)}
|
||||
title="Approve node"
|
||||
>
|
||||
<i className="fas fa-check" />
|
||||
<i className="fas fa-check" /> Approve
|
||||
</button>
|
||||
)}
|
||||
{node.status === 'draining' && (
|
||||
<button
|
||||
className="btn btn-secondary btn-sm"
|
||||
onClick={() => handleResume(node.id)}
|
||||
title="Resume node"
|
||||
title="Resume accepting requests"
|
||||
>
|
||||
<i className="fas fa-play" />
|
||||
<i className="fas fa-play" /> Resume
|
||||
</button>
|
||||
)}
|
||||
{node.status !== 'draining' && node.status !== 'pending' && (
|
||||
<button
|
||||
className="btn btn-secondary btn-sm"
|
||||
onClick={() => handleDrain(node.id)}
|
||||
title="Drain node"
|
||||
title="Stop sending new requests to this node"
|
||||
>
|
||||
<i className="fas fa-pause" />
|
||||
<i className="fas fa-pause" /> Drain
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
className="btn btn-danger btn-sm"
|
||||
className="btn btn-danger-ghost btn-sm"
|
||||
onClick={() => setConfirmDelete(node)}
|
||||
title="Remove node"
|
||||
title="Remove node from cluster"
|
||||
>
|
||||
<i className="fas fa-trash" />
|
||||
</button>
|
||||
@@ -794,7 +782,10 @@ export default function Nodes() {
|
||||
{!models ? (
|
||||
<LoadingSpinner size="sm" />
|
||||
) : models.length === 0 ? (
|
||||
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>No models loaded on this node</p>
|
||||
<div className="drawer-empty">
|
||||
<i className="fas fa-cube" />
|
||||
<span>No models loaded on this node yet.</span>
|
||||
</div>
|
||||
) : (
|
||||
<table className="table" style={{ margin: 0 }}>
|
||||
<thead>
|
||||
@@ -870,7 +861,10 @@ export default function Nodes() {
|
||||
{!backends ? (
|
||||
<LoadingSpinner size="sm" />
|
||||
) : backends.length === 0 ? (
|
||||
<p style={{ fontSize: '0.8125rem', color: 'var(--color-text-muted)' }}>No backends installed on this node</p>
|
||||
<div className="drawer-empty">
|
||||
<i className="fas fa-cogs" />
|
||||
<span>No backends installed on this node. Install one from the gallery to schedule models here.</span>
|
||||
</div>
|
||||
) : (
|
||||
<table className="table" style={{ margin: 0 }}>
|
||||
<thead>
|
||||
|
||||
@@ -23,7 +23,6 @@ import (
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/http/auth"
|
||||
"github.com/mudler/LocalAI/core/http/endpoints/localai"
|
||||
"github.com/mudler/LocalAI/core/http/middleware"
|
||||
"github.com/mudler/LocalAI/core/p2p"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
@@ -510,28 +509,89 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
|
||||
modelConfigs := cl.GetAllModelsConfigs()
|
||||
modelsWithoutConfig, _ := galleryop.ListModels(cl, ml, config.NoFilterFn, galleryop.LOOSE_ONLY)
|
||||
|
||||
type loadedOn struct {
|
||||
NodeID string `json:"node_id"`
|
||||
NodeName string `json:"node_name"`
|
||||
State string `json:"state"`
|
||||
NodeStatus string `json:"node_status"`
|
||||
}
|
||||
type modelCapability struct {
|
||||
ID string `json:"id"`
|
||||
Capabilities []string `json:"capabilities"`
|
||||
Backend string `json:"backend"`
|
||||
Disabled bool `json:"disabled"`
|
||||
Pinned bool `json:"pinned"`
|
||||
ID string `json:"id"`
|
||||
Capabilities []string `json:"capabilities"`
|
||||
Backend string `json:"backend"`
|
||||
Disabled bool `json:"disabled"`
|
||||
Pinned bool `json:"pinned"`
|
||||
// LoadedOn is populated only when the node registry is active
|
||||
// (distributed mode). Lets the UI show "loaded on worker-1" without
|
||||
// the operator having to expand every node manually. An empty slice
|
||||
// with nil reports "no loaded replicas" vs. nil reports "not in
|
||||
// cluster mode" — the frontend treats both as "no distribution info".
|
||||
LoadedOn []loadedOn `json:"loaded_on,omitempty"`
|
||||
// Source="registry-only" marks models adopted from the cluster that
|
||||
// have no local config yet (ghosts that the reconciler discovered).
|
||||
Source string `json:"source,omitempty"`
|
||||
}
|
||||
|
||||
// Join with the node registry when we have one (distributed mode). A
|
||||
// single registry fetch + map join beats per-model queries for the
|
||||
// 100-model case.
|
||||
var loadedByModel map[string][]loadedOn
|
||||
if ds := applicationInstance.Distributed(); ds != nil && ds.Registry != nil {
|
||||
nodeModels, err := ds.Registry.ListAllLoadedModels(c.Request().Context())
|
||||
if err == nil {
|
||||
allNodes, _ := ds.Registry.List(c.Request().Context())
|
||||
nameByID := make(map[string]string, len(allNodes))
|
||||
statusByID := make(map[string]string, len(allNodes))
|
||||
for _, n := range allNodes {
|
||||
nameByID[n.ID] = n.Name
|
||||
statusByID[n.ID] = n.Status
|
||||
}
|
||||
loadedByModel = make(map[string][]loadedOn)
|
||||
for _, nm := range nodeModels {
|
||||
loadedByModel[nm.ModelName] = append(loadedByModel[nm.ModelName], loadedOn{
|
||||
NodeID: nm.NodeID,
|
||||
NodeName: nameByID[nm.NodeID],
|
||||
State: nm.State,
|
||||
NodeStatus: statusByID[nm.NodeID],
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
result := make([]modelCapability, 0, len(modelConfigs)+len(modelsWithoutConfig))
|
||||
seen := make(map[string]bool, len(modelConfigs)+len(modelsWithoutConfig))
|
||||
for _, cfg := range modelConfigs {
|
||||
seen[cfg.Name] = true
|
||||
result = append(result, modelCapability{
|
||||
ID: cfg.Name,
|
||||
Capabilities: cfg.KnownUsecaseStrings,
|
||||
Backend: cfg.Backend,
|
||||
Disabled: cfg.IsDisabled(),
|
||||
Pinned: cfg.IsPinned(),
|
||||
LoadedOn: loadedByModel[cfg.Name],
|
||||
})
|
||||
}
|
||||
for _, name := range modelsWithoutConfig {
|
||||
seen[name] = true
|
||||
result = append(result, modelCapability{
|
||||
ID: name,
|
||||
Capabilities: []string{},
|
||||
LoadedOn: loadedByModel[name],
|
||||
})
|
||||
}
|
||||
// Emit entries for cluster models that have no local config — these
|
||||
// are the actual ghosts. Without this the operator would have no way
|
||||
// to see a model the cluster is running if its config file wasn't
|
||||
// synced to this frontend's filesystem.
|
||||
for name, loc := range loadedByModel {
|
||||
if seen[name] {
|
||||
continue
|
||||
}
|
||||
result = append(result, modelCapability{
|
||||
ID: name,
|
||||
Capabilities: []string{},
|
||||
LoadedOn: loc,
|
||||
Source: "registry-only",
|
||||
})
|
||||
}
|
||||
|
||||
@@ -1397,24 +1457,5 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
|
||||
app.POST("/api/settings", localai.UpdateSettingsEndpoint(applicationInstance), adminMiddleware)
|
||||
}
|
||||
|
||||
// Logs API (admin only)
|
||||
app.GET("/api/traces", func(c echo.Context) error {
|
||||
if !appConfig.EnableTracing {
|
||||
return c.JSON(503, map[string]any{
|
||||
"error": "Tracing disabled",
|
||||
})
|
||||
}
|
||||
traces := middleware.GetTraces()
|
||||
return c.JSON(200, map[string]any{
|
||||
"traces": traces,
|
||||
})
|
||||
}, adminMiddleware)
|
||||
|
||||
app.POST("/api/traces/clear", func(c echo.Context) error {
|
||||
middleware.ClearTraces()
|
||||
return c.JSON(200, map[string]any{
|
||||
"message": "Traces cleared",
|
||||
})
|
||||
}, adminMiddleware)
|
||||
}
|
||||
|
||||
|
||||
@@ -11,4 +11,5 @@ const (
|
||||
KeyHealthCheck int64 = 104
|
||||
KeySchemaMigrate int64 = 105
|
||||
KeyBackendUpgradeCheck int64 = 106
|
||||
KeyStateReconciler int64 = 107
|
||||
)
|
||||
|
||||
@@ -57,6 +57,16 @@ func (g *GalleryService) SetBackendManager(b BackendManager) {
|
||||
g.backendManager = b
|
||||
}
|
||||
|
||||
// BackendManager returns the current backend manager. Callers like the
|
||||
// periodic upgrade checker need this so they run CheckUpgrades through the
|
||||
// distributed implementation (which asks workers) instead of the frontend's
|
||||
// local filesystem — the latter is always empty in distributed deployments.
|
||||
func (g *GalleryService) BackendManager() BackendManager {
|
||||
g.Lock()
|
||||
defer g.Unlock()
|
||||
return g.backendManager
|
||||
}
|
||||
|
||||
// SetNATSClient sets the NATS client for distributed progress publishing.
|
||||
func (g *GalleryService) SetNATSClient(nc messaging.Publisher) {
|
||||
g.Lock()
|
||||
|
||||
@@ -124,8 +124,13 @@ func SubjectNodeBackendInstall(nodeID string) string {
|
||||
// BackendInstallRequest is the payload for a backend.install NATS request.
|
||||
type BackendInstallRequest struct {
|
||||
Backend string `json:"backend"`
|
||||
ModelID string `json:"model_id,omitempty"` // unique model identifier — each model gets its own gRPC process
|
||||
ModelID string `json:"model_id,omitempty"`
|
||||
BackendGalleries string `json:"backend_galleries,omitempty"`
|
||||
// URI is set for external installs (OCI image, URL, or path). When non-empty
|
||||
// the worker routes to InstallExternalBackend instead of the gallery lookup.
|
||||
URI string `json:"uri,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Alias string `json:"alias,omitempty"`
|
||||
}
|
||||
|
||||
// BackendInstallReply is the response from a backend.install NATS request.
|
||||
@@ -157,6 +162,12 @@ type NodeBackendInfo struct {
|
||||
IsMeta bool `json:"is_meta"`
|
||||
InstalledAt string `json:"installed_at,omitempty"`
|
||||
GalleryURL string `json:"gallery_url,omitempty"`
|
||||
// Version, URI and Digest enable cluster-wide upgrade detection —
|
||||
// without them, the frontend cannot tell whether the installed OCI
|
||||
// image matches the gallery entry, and upgrades silently never surface.
|
||||
Version string `json:"version,omitempty"`
|
||||
URI string `json:"uri,omitempty"`
|
||||
Digest string `json:"digest,omitempty"`
|
||||
}
|
||||
|
||||
// SubjectNodeBackendStop tells a worker node to stop its gRPC backend process.
|
||||
|
||||
@@ -10,6 +10,7 @@ import (
|
||||
"github.com/mudler/LocalAI/core/gallery"
|
||||
"github.com/mudler/LocalAI/core/services/galleryop"
|
||||
"github.com/mudler/LocalAI/pkg/model"
|
||||
"github.com/mudler/LocalAI/pkg/system"
|
||||
"github.com/mudler/xlog"
|
||||
"github.com/nats-io/nats.go"
|
||||
)
|
||||
@@ -53,6 +54,7 @@ type DistributedBackendManager struct {
|
||||
adapter *RemoteUnloaderAdapter
|
||||
registry *NodeRegistry
|
||||
backendGalleries []config.Gallery
|
||||
systemState *system.SystemState
|
||||
}
|
||||
|
||||
// NewDistributedBackendManager creates a DistributedBackendManager.
|
||||
@@ -62,46 +64,168 @@ func NewDistributedBackendManager(appConfig *config.ApplicationConfig, ml *model
|
||||
adapter: adapter,
|
||||
registry: registry,
|
||||
backendGalleries: appConfig.BackendGalleries,
|
||||
systemState: appConfig.SystemState,
|
||||
}
|
||||
}
|
||||
|
||||
// NodeOpStatus is the per-node outcome of a backend lifecycle operation.
|
||||
// Returned as part of BackendOpResult so the frontend can surface exactly
|
||||
// what happened on each worker instead of a single joined error string.
|
||||
type NodeOpStatus struct {
|
||||
NodeID string `json:"node_id"`
|
||||
NodeName string `json:"node_name"`
|
||||
Status string `json:"status"` // "success" | "queued" | "error"
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
// BackendOpResult aggregates per-node outcomes.
|
||||
type BackendOpResult struct {
|
||||
Nodes []NodeOpStatus `json:"nodes"`
|
||||
}
|
||||
|
||||
// enqueueAndDrainBackendOp is the shared scaffolding for
|
||||
// delete/install/upgrade. Every non-pending node gets a pending_backend_ops
|
||||
// row (intent is durable even if the node is offline). Currently-healthy
|
||||
// nodes get an immediate attempt; success deletes the row, failure records
|
||||
// the error and leaves the row for the reconciler to retry.
|
||||
//
|
||||
// `apply` is the NATS round-trip for one node. Returning an error keeps the
|
||||
// row in the queue and marks the per-node status as "error"; returning nil
|
||||
// deletes the row and reports "success". For non-healthy nodes the status
|
||||
// is "queued" — no attempt is made right now, reconciler will pick it up
|
||||
// when the node returns.
|
||||
func (d *DistributedBackendManager) enqueueAndDrainBackendOp(ctx context.Context, op, backend string, galleriesJSON []byte, apply func(node BackendNode) error) (BackendOpResult, error) {
|
||||
allNodes, err := d.registry.List(ctx)
|
||||
if err != nil {
|
||||
return BackendOpResult{}, err
|
||||
}
|
||||
|
||||
result := BackendOpResult{Nodes: make([]NodeOpStatus, 0, len(allNodes))}
|
||||
for _, node := range allNodes {
|
||||
// Pending nodes haven't been approved yet — no intent to apply.
|
||||
if node.Status == StatusPending {
|
||||
continue
|
||||
}
|
||||
// Backend lifecycle ops only make sense on backend-type workers.
|
||||
// Agent workers don't subscribe to backend.install/delete/list, so
|
||||
// enqueueing for them guarantees a forever-retrying row that the
|
||||
// reconciler can never drain. Silently skip — they aren't consumers.
|
||||
if node.NodeType != "" && node.NodeType != NodeTypeBackend {
|
||||
continue
|
||||
}
|
||||
if err := d.registry.UpsertPendingBackendOp(ctx, node.ID, backend, op, galleriesJSON); err != nil {
|
||||
xlog.Warn("Failed to enqueue backend op", "op", op, "node", node.Name, "backend", backend, "error", err)
|
||||
result.Nodes = append(result.Nodes, NodeOpStatus{
|
||||
NodeID: node.ID, NodeName: node.Name, Status: "error",
|
||||
Error: fmt.Sprintf("enqueue failed: %v", err),
|
||||
})
|
||||
continue
|
||||
}
|
||||
|
||||
if node.Status != StatusHealthy {
|
||||
// Intent is recorded; reconciler will retry when the node recovers.
|
||||
result.Nodes = append(result.Nodes, NodeOpStatus{
|
||||
NodeID: node.ID, NodeName: node.Name, Status: "queued",
|
||||
Error: fmt.Sprintf("node %s, will retry when healthy", node.Status),
|
||||
})
|
||||
continue
|
||||
}
|
||||
|
||||
applyErr := apply(node)
|
||||
if applyErr == nil {
|
||||
// Find the row we just upserted and delete it; cheap but requires
|
||||
// a lookup since UpsertPendingBackendOp doesn't return the ID.
|
||||
if err := d.deletePendingRow(ctx, node.ID, backend, op); err != nil {
|
||||
xlog.Debug("Failed to clear pending backend op after success", "error", err)
|
||||
}
|
||||
result.Nodes = append(result.Nodes, NodeOpStatus{
|
||||
NodeID: node.ID, NodeName: node.Name, Status: "success",
|
||||
})
|
||||
continue
|
||||
}
|
||||
|
||||
// Record failure for backoff. If it's an ErrNoResponders, the node's
|
||||
// gone AWOL — mark unhealthy so the router stops picking it too.
|
||||
errMsg := applyErr.Error()
|
||||
if errors.Is(applyErr, nats.ErrNoResponders) {
|
||||
xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
|
||||
d.registry.MarkUnhealthy(ctx, node.ID)
|
||||
}
|
||||
if id, err := d.findPendingRow(ctx, node.ID, backend, op); err == nil {
|
||||
_ = d.registry.RecordPendingBackendOpFailure(ctx, id, errMsg)
|
||||
}
|
||||
result.Nodes = append(result.Nodes, NodeOpStatus{
|
||||
NodeID: node.ID, NodeName: node.Name, Status: "error", Error: errMsg,
|
||||
})
|
||||
}
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// findPendingRow looks up the ID of a pending_backend_ops row by its
|
||||
// composite key. Used to hand off to RecordPendingBackendOpFailure /
|
||||
// DeletePendingBackendOp after UpsertPendingBackendOp upserts by the same
|
||||
// composite key.
|
||||
func (d *DistributedBackendManager) findPendingRow(ctx context.Context, nodeID, backend, op string) (uint, error) {
|
||||
var row PendingBackendOp
|
||||
if err := d.registry.db.WithContext(ctx).
|
||||
Where("node_id = ? AND backend = ? AND op = ?", nodeID, backend, op).
|
||||
First(&row).Error; err != nil {
|
||||
return 0, err
|
||||
}
|
||||
return row.ID, nil
|
||||
}
|
||||
|
||||
// deletePendingRow removes the queue row keyed by (nodeID, backend, op).
|
||||
func (d *DistributedBackendManager) deletePendingRow(ctx context.Context, nodeID, backend, op string) error {
|
||||
return d.registry.db.WithContext(ctx).
|
||||
Where("node_id = ? AND backend = ? AND op = ?", nodeID, backend, op).
|
||||
Delete(&PendingBackendOp{}).Error
|
||||
}
|
||||
|
||||
// DeleteBackend fans out backend deletion to every known node. The previous
|
||||
// implementation silently skipped non-healthy nodes, which meant zombies
|
||||
// reappeared once those nodes returned. Now the intent is durable — see
|
||||
// enqueueAndDrainBackendOp — and the reconciler catches up later.
|
||||
func (d *DistributedBackendManager) DeleteBackend(name string) error {
|
||||
// Try local deletion but ignore "not found" errors — in distributed mode
|
||||
// the frontend node typically doesn't have backends installed locally;
|
||||
// they only exist on worker nodes.
|
||||
// Local delete first (frontend rarely has backends installed in
|
||||
// distributed mode, but the gallery operation still expects it; ignore
|
||||
// "not found" which is the common case).
|
||||
if err := d.local.DeleteBackend(name); err != nil {
|
||||
if !errors.Is(err, gallery.ErrBackendNotFound) {
|
||||
return err
|
||||
}
|
||||
xlog.Debug("Backend not found locally, will attempt deletion on workers", "backend", name)
|
||||
}
|
||||
// Fan out backend.delete to all healthy nodes
|
||||
allNodes, listErr := d.registry.List(context.Background())
|
||||
if listErr != nil {
|
||||
xlog.Warn("Failed to list nodes for backend deletion fan-out", "error", listErr)
|
||||
return listErr
|
||||
}
|
||||
var errs []error
|
||||
for _, node := range allNodes {
|
||||
if node.Status != StatusHealthy {
|
||||
continue
|
||||
}
|
||||
if _, delErr := d.adapter.DeleteBackend(node.ID, name); delErr != nil {
|
||||
if errors.Is(delErr, nats.ErrNoResponders) {
|
||||
// Node's NATS subscription is gone — likely restarted with a new ID.
|
||||
// Mark it unhealthy so future fan-outs skip it.
|
||||
xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
|
||||
d.registry.MarkUnhealthy(context.Background(), node.ID)
|
||||
continue
|
||||
}
|
||||
xlog.Warn("Failed to propagate backend deletion to worker", "node", node.Name, "backend", name, "error", delErr)
|
||||
errs = append(errs, fmt.Errorf("node %s: %w", node.Name, delErr))
|
||||
}
|
||||
}
|
||||
return errors.Join(errs...)
|
||||
|
||||
ctx := context.Background()
|
||||
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error {
|
||||
_, err := d.adapter.DeleteBackend(node.ID, name)
|
||||
return err
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// ListBackends aggregates installed backends from all healthy worker nodes.
|
||||
// DeleteBackendDetailed is the per-node-result variant called by the HTTP
|
||||
// handler so the UI can render a per-node status drawer. DeleteBackend still
|
||||
// returns error-only for callers that don't care about node breakdown.
|
||||
func (d *DistributedBackendManager) DeleteBackendDetailed(ctx context.Context, name string) (BackendOpResult, error) {
|
||||
if err := d.local.DeleteBackend(name); err != nil && !errors.Is(err, gallery.ErrBackendNotFound) {
|
||||
return BackendOpResult{}, err
|
||||
}
|
||||
return d.enqueueAndDrainBackendOp(ctx, OpBackendDelete, name, nil, func(node BackendNode) error {
|
||||
_, err := d.adapter.DeleteBackend(node.ID, name)
|
||||
return err
|
||||
})
|
||||
}
|
||||
|
||||
// ListBackends aggregates installed backends from all worker nodes, preserving
|
||||
// per-node attribution. Each SystemBackend.Nodes entry records which node has
|
||||
// the backend and the version/digest it reports. The top-level Metadata is
|
||||
// populated from the first node seen so single-node-minded callers still work.
|
||||
//
|
||||
// Pending/offline/draining nodes are skipped because they aren't expected to
|
||||
// answer NATS requests; unhealthy nodes are still queried — ErrNoResponders
|
||||
// then marks them unhealthy and the loop continues.
|
||||
func (d *DistributedBackendManager) ListBackends() (gallery.SystemBackends, error) {
|
||||
result := make(gallery.SystemBackends)
|
||||
allNodes, err := d.registry.List(context.Background())
|
||||
@@ -110,7 +234,7 @@ func (d *DistributedBackendManager) ListBackends() (gallery.SystemBackends, erro
|
||||
}
|
||||
|
||||
for _, node := range allNodes {
|
||||
if node.Status != StatusHealthy {
|
||||
if node.Status == StatusPending || node.Status == StatusOffline || node.Status == StatusDraining {
|
||||
continue
|
||||
}
|
||||
reply, err := d.adapter.ListBackends(node.ID)
|
||||
@@ -128,89 +252,92 @@ func (d *DistributedBackendManager) ListBackends() (gallery.SystemBackends, erro
|
||||
continue
|
||||
}
|
||||
for _, b := range reply.Backends {
|
||||
if _, exists := result[b.Name]; !exists {
|
||||
result[b.Name] = gallery.SystemBackend{
|
||||
ref := gallery.NodeBackendRef{
|
||||
NodeID: node.ID,
|
||||
NodeName: node.Name,
|
||||
NodeStatus: node.Status,
|
||||
Version: b.Version,
|
||||
Digest: b.Digest,
|
||||
URI: b.URI,
|
||||
InstalledAt: b.InstalledAt,
|
||||
}
|
||||
entry, exists := result[b.Name]
|
||||
if !exists {
|
||||
entry = gallery.SystemBackend{
|
||||
Name: b.Name,
|
||||
IsSystem: b.IsSystem,
|
||||
IsMeta: b.IsMeta,
|
||||
Metadata: &gallery.BackendMetadata{
|
||||
Name: b.Name,
|
||||
InstalledAt: b.InstalledAt,
|
||||
GalleryURL: b.GalleryURL,
|
||||
Version: b.Version,
|
||||
URI: b.URI,
|
||||
Digest: b.Digest,
|
||||
},
|
||||
}
|
||||
}
|
||||
entry.Nodes = append(entry.Nodes, ref)
|
||||
result[b.Name] = entry
|
||||
}
|
||||
}
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// InstallBackend fans out backend installation to all healthy worker nodes.
|
||||
// InstallBackend fans out installation through the pending-ops queue so
|
||||
// non-healthy nodes get retried when they come back instead of being silently
|
||||
// skipped. Reply success from the NATS round-trip deletes the queue row;
|
||||
// reply.Success==false is treated as an error so the row stays for retry.
|
||||
func (d *DistributedBackendManager) InstallBackend(ctx context.Context, op *galleryop.ManagementOp[gallery.GalleryBackend, any], progressCb galleryop.ProgressCallback) error {
|
||||
allNodes, err := d.registry.List(context.Background())
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
galleriesJSON, _ := json.Marshal(op.Galleries)
|
||||
backendName := op.GalleryElementName
|
||||
|
||||
for _, node := range allNodes {
|
||||
if node.Status != StatusHealthy {
|
||||
continue
|
||||
}
|
||||
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON))
|
||||
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendInstall, backendName, galleriesJSON, func(node BackendNode) error {
|
||||
reply, err := d.adapter.InstallBackend(node.ID, backendName, "", string(galleriesJSON), op.ExternalURI, op.ExternalName, op.ExternalAlias)
|
||||
if err != nil {
|
||||
if errors.Is(err, nats.ErrNoResponders) {
|
||||
xlog.Warn("No NATS responders for node, marking unhealthy", "node", node.Name, "nodeID", node.ID)
|
||||
d.registry.MarkUnhealthy(context.Background(), node.ID)
|
||||
continue
|
||||
}
|
||||
xlog.Warn("Failed to install backend on worker", "node", node.Name, "backend", backendName, "error", err)
|
||||
continue
|
||||
return err
|
||||
}
|
||||
if !reply.Success {
|
||||
xlog.Warn("Backend install failed on worker", "node", node.Name, "backend", backendName, "error", reply.Error)
|
||||
return fmt.Errorf("install failed: %s", reply.Error)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
return nil
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// UpgradeBackend fans out a backend upgrade to all healthy worker nodes.
|
||||
// TODO: Add dedicated NATS subject for upgrade (currently reuses install with force flag)
|
||||
// UpgradeBackend reuses the install NATS subject (the worker re-downloads
|
||||
// from the gallery). Same queue semantics as Install/Delete.
|
||||
func (d *DistributedBackendManager) UpgradeBackend(ctx context.Context, name string, progressCb galleryop.ProgressCallback) error {
|
||||
allNodes, err := d.registry.List(context.Background())
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
galleriesJSON, _ := json.Marshal(d.backendGalleries)
|
||||
var errs []error
|
||||
|
||||
for _, node := range allNodes {
|
||||
if node.Status != StatusHealthy {
|
||||
continue
|
||||
}
|
||||
// Reuse install endpoint which will re-download the backend (force mode)
|
||||
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON))
|
||||
_, err := d.enqueueAndDrainBackendOp(ctx, OpBackendUpgrade, name, galleriesJSON, func(node BackendNode) error {
|
||||
reply, err := d.adapter.InstallBackend(node.ID, name, "", string(galleriesJSON), "", "", "")
|
||||
if err != nil {
|
||||
if errors.Is(err, nats.ErrNoResponders) {
|
||||
xlog.Warn("No NATS responders for node during upgrade, marking unhealthy", "node", node.Name, "nodeID", node.ID)
|
||||
d.registry.MarkUnhealthy(context.Background(), node.ID)
|
||||
continue
|
||||
}
|
||||
errs = append(errs, fmt.Errorf("node %s: %w", node.Name, err))
|
||||
continue
|
||||
return err
|
||||
}
|
||||
if !reply.Success {
|
||||
errs = append(errs, fmt.Errorf("node %s: %s", node.Name, reply.Error))
|
||||
return fmt.Errorf("upgrade failed: %s", reply.Error)
|
||||
}
|
||||
}
|
||||
|
||||
return errors.Join(errs...)
|
||||
return nil
|
||||
})
|
||||
return err
|
||||
}
|
||||
|
||||
// CheckUpgrades checks for available backend upgrades.
|
||||
// Gallery comparison is global (not per-node), so we delegate to the local manager.
|
||||
// CheckUpgrades checks for available backend upgrades across the cluster.
|
||||
//
|
||||
// The previous implementation delegated to d.local, which called
|
||||
// ListSystemBackends on the frontend — but in distributed mode the frontend
|
||||
// has no backends installed locally, so the upgrade loop never ran and the UI
|
||||
// never surfaced any upgrades. We now feed the cluster-wide aggregation
|
||||
// (including per-node versions/digests) into gallery.CheckUpgradesAgainst so
|
||||
// digest-based detection actually works and cluster drift is visible.
|
||||
func (d *DistributedBackendManager) CheckUpgrades(ctx context.Context) (map[string]gallery.UpgradeInfo, error) {
|
||||
return d.local.CheckUpgrades(ctx)
|
||||
installed, err := d.ListBackends()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
// systemState is used by AvailableBackends (gallery paths + meta-backend
|
||||
// resolution). The `installed` argument is what the old code got wrong —
|
||||
// it used to come from the empty frontend filesystem.
|
||||
return gallery.CheckUpgradesAgainst(ctx, d.backendGalleries, d.systemState, installed)
|
||||
}
|
||||
|
||||
@@ -3,26 +3,59 @@ package nodes
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/mudler/LocalAI/core/services/advisorylock"
|
||||
grpcclient "github.com/mudler/LocalAI/pkg/grpc"
|
||||
"github.com/mudler/xlog"
|
||||
"github.com/nats-io/nats.go"
|
||||
"gorm.io/gorm"
|
||||
)
|
||||
|
||||
// ModelProber checks whether a model's backend process is still reachable.
|
||||
// Defaulted to a gRPC health probe but overridable for tests so we don't
|
||||
// need to stand up a real server. Returning false without an error means the
|
||||
// process is reachable but unhealthy (same as a timeout for our purposes).
|
||||
type ModelProber interface {
|
||||
IsAlive(ctx context.Context, address string) bool
|
||||
}
|
||||
|
||||
// grpcModelProber does a 1s HealthCheck on the model's stored gRPC address.
|
||||
type grpcModelProber struct{ token string }
|
||||
|
||||
func (g grpcModelProber) IsAlive(ctx context.Context, address string) bool {
|
||||
client := grpcclient.NewClientWithToken(address, false, nil, false, g.token)
|
||||
probeCtx, cancel := context.WithTimeout(ctx, 1*time.Second)
|
||||
defer cancel()
|
||||
ok, _ := client.HealthCheck(probeCtx)
|
||||
return ok
|
||||
}
|
||||
|
||||
// ReplicaReconciler periodically ensures model replica counts match their
|
||||
// scheduling configs. It scales up replicas when below MinReplicas or when
|
||||
// all replicas are busy (up to MaxReplicas), and scales down idle replicas
|
||||
// above MinReplicas.
|
||||
//
|
||||
// Alongside replica scaling it runs two state-reconciliation passes — draining
|
||||
// the pending_backend_ops queue and probing loaded models' gRPC addresses to
|
||||
// orphan ghosts. Both passes are wrapped in the KeyStateReconciler advisory
|
||||
// lock so N frontends don't stampede.
|
||||
//
|
||||
// Only processes models with auto-scaling enabled (MinReplicas > 0 or MaxReplicas > 0).
|
||||
type ReplicaReconciler struct {
|
||||
registry *NodeRegistry
|
||||
scheduler ModelScheduler // interface for scheduling new models
|
||||
unloader NodeCommandSender
|
||||
adapter *RemoteUnloaderAdapter // NATS sender for pending-op drain
|
||||
prober ModelProber // health probe for model gRPC addrs
|
||||
db *gorm.DB
|
||||
interval time.Duration
|
||||
scaleDownDelay time.Duration
|
||||
// probeStaleAfter: only probe node_models rows older than this so we
|
||||
// don't hammer every worker every tick for models we just heard from.
|
||||
probeStaleAfter time.Duration
|
||||
}
|
||||
|
||||
// ModelScheduler abstracts the scheduling logic needed by the reconciler.
|
||||
@@ -35,12 +68,21 @@ type ModelScheduler interface {
|
||||
|
||||
// ReplicaReconcilerOptions holds configuration for creating a ReplicaReconciler.
|
||||
type ReplicaReconcilerOptions struct {
|
||||
Registry *NodeRegistry
|
||||
Scheduler ModelScheduler
|
||||
Unloader NodeCommandSender
|
||||
DB *gorm.DB
|
||||
Interval time.Duration // default 30s
|
||||
ScaleDownDelay time.Duration // default 5m
|
||||
Registry *NodeRegistry
|
||||
Scheduler ModelScheduler
|
||||
Unloader NodeCommandSender
|
||||
// Adapter is the NATS sender used to retry pending backend ops. When nil,
|
||||
// the state-reconciler pending-drain pass is a no-op (single-node mode).
|
||||
Adapter *RemoteUnloaderAdapter
|
||||
// RegistrationToken is used by the default gRPC prober when probing model
|
||||
// addresses. Matches the worker's token so HealthCheck auth succeeds.
|
||||
RegistrationToken string
|
||||
// Prober overrides the default gRPC health probe (used by tests).
|
||||
Prober ModelProber
|
||||
DB *gorm.DB
|
||||
Interval time.Duration // default 30s
|
||||
ScaleDownDelay time.Duration // default 5m
|
||||
ProbeStaleAfter time.Duration // default 2m
|
||||
}
|
||||
|
||||
// NewReplicaReconciler creates a new ReplicaReconciler.
|
||||
@@ -53,13 +95,24 @@ func NewReplicaReconciler(opts ReplicaReconcilerOptions) *ReplicaReconciler {
|
||||
if scaleDownDelay == 0 {
|
||||
scaleDownDelay = 5 * time.Minute
|
||||
}
|
||||
probeStaleAfter := opts.ProbeStaleAfter
|
||||
if probeStaleAfter == 0 {
|
||||
probeStaleAfter = 2 * time.Minute
|
||||
}
|
||||
prober := opts.Prober
|
||||
if prober == nil {
|
||||
prober = grpcModelProber{token: opts.RegistrationToken}
|
||||
}
|
||||
return &ReplicaReconciler{
|
||||
registry: opts.Registry,
|
||||
scheduler: opts.Scheduler,
|
||||
unloader: opts.Unloader,
|
||||
db: opts.DB,
|
||||
interval: interval,
|
||||
scaleDownDelay: scaleDownDelay,
|
||||
registry: opts.Registry,
|
||||
scheduler: opts.Scheduler,
|
||||
unloader: opts.Unloader,
|
||||
adapter: opts.Adapter,
|
||||
prober: prober,
|
||||
db: opts.DB,
|
||||
interval: interval,
|
||||
scaleDownDelay: scaleDownDelay,
|
||||
probeStaleAfter: probeStaleAfter,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -78,17 +131,157 @@ func (rc *ReplicaReconciler) Run(ctx context.Context) {
|
||||
}
|
||||
}
|
||||
|
||||
// reconcileOnce performs a single reconciliation pass.
|
||||
// Uses an advisory lock so only one frontend instance reconciles at a time.
|
||||
// reconcileOnce performs a single reconciliation pass. Replica work and
|
||||
// state-reconciliation work run under *different* advisory locks so multiple
|
||||
// frontends can share load across passes, and one long-running pass doesn't
|
||||
// block the other forever if a frontend wedges.
|
||||
func (rc *ReplicaReconciler) reconcileOnce(ctx context.Context) {
|
||||
if rc.db != nil {
|
||||
lockKey := advisorylock.KeyFromString("replica-reconciler")
|
||||
_ = advisorylock.WithLockCtx(ctx, rc.db, lockKey, func() error {
|
||||
replicaKey := advisorylock.KeyFromString("replica-reconciler")
|
||||
_ = advisorylock.WithLockCtx(ctx, rc.db, replicaKey, func() error {
|
||||
rc.reconcile(ctx)
|
||||
return nil
|
||||
})
|
||||
// Try, don't block: if another frontend is already running the state
|
||||
// pass, this tick is a no-op. Matches the health monitor pattern.
|
||||
_, _ = advisorylock.TryWithLockCtx(ctx, rc.db, advisorylock.KeyStateReconciler, func() error {
|
||||
rc.reconcileState(ctx)
|
||||
return nil
|
||||
})
|
||||
} else {
|
||||
rc.reconcile(ctx)
|
||||
rc.reconcileState(ctx)
|
||||
}
|
||||
}
|
||||
|
||||
// reconcileState runs the state-reconciliation passes: drain pending backend
|
||||
// ops for freshly-healthy nodes, then probe model gRPC addresses to orphan
|
||||
// ghosts. Both passes are best-effort: a failure on one node doesn't stop
|
||||
// the rest.
|
||||
func (rc *ReplicaReconciler) reconcileState(ctx context.Context) {
|
||||
if rc.adapter != nil {
|
||||
rc.drainPendingBackendOps(ctx)
|
||||
}
|
||||
rc.probeLoadedModels(ctx)
|
||||
}
|
||||
|
||||
// drainPendingBackendOps retries queued backend ops whose next_retry_at has
|
||||
// passed on nodes that are currently healthy. On success the row is deleted;
|
||||
// on failure attempts++ and next_retry_at moves out via exponential backoff.
|
||||
func (rc *ReplicaReconciler) drainPendingBackendOps(ctx context.Context) {
|
||||
ops, err := rc.registry.ListDuePendingBackendOps(ctx)
|
||||
if err != nil {
|
||||
xlog.Warn("Reconciler: failed to list pending backend ops", "error", err)
|
||||
return
|
||||
}
|
||||
if len(ops) == 0 {
|
||||
return
|
||||
}
|
||||
xlog.Debug("Reconciler: draining pending backend ops", "count", len(ops))
|
||||
|
||||
for _, op := range ops {
|
||||
if err := ctx.Err(); err != nil {
|
||||
return
|
||||
}
|
||||
var applyErr error
|
||||
switch op.Op {
|
||||
case OpBackendDelete:
|
||||
_, applyErr = rc.adapter.DeleteBackend(op.NodeID, op.Backend)
|
||||
case OpBackendInstall, OpBackendUpgrade:
|
||||
reply, err := rc.adapter.InstallBackend(op.NodeID, op.Backend, "", string(op.Galleries), "", "", "")
|
||||
if err != nil {
|
||||
applyErr = err
|
||||
} else if !reply.Success {
|
||||
applyErr = fmt.Errorf("%s failed: %s", op.Op, reply.Error)
|
||||
}
|
||||
default:
|
||||
xlog.Warn("Reconciler: unknown pending op", "op", op.Op, "id", op.ID)
|
||||
continue
|
||||
}
|
||||
|
||||
if applyErr == nil {
|
||||
if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
|
||||
xlog.Warn("Reconciler: failed to delete drained op row", "id", op.ID, "error", err)
|
||||
} else {
|
||||
xlog.Info("Reconciler: pending backend op applied",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1)
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
// ErrNoResponders means the node has no active NATS subscription for
|
||||
// this subject. Either its connection dropped, or it's the wrong
|
||||
// node type entirely. Mark unhealthy so the health monitor's
|
||||
// heartbeat-only pass doesn't immediately flip it back — and so
|
||||
// ListDuePendingBackendOps (which filters by status=healthy) stops
|
||||
// picking the row until the node genuinely recovers.
|
||||
if errors.Is(applyErr, nats.ErrNoResponders) {
|
||||
xlog.Warn("Reconciler: no NATS responders — marking node unhealthy",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID)
|
||||
_ = rc.registry.MarkUnhealthy(ctx, op.NodeID)
|
||||
}
|
||||
|
||||
// Dead-letter cap: after maxAttempts the row is the reconciler
|
||||
// equivalent of a poison message. Delete it loudly so the queue
|
||||
// doesn't churn NATS every tick forever — operators can re-issue
|
||||
// the op from the UI if they still want it applied.
|
||||
if op.Attempts+1 >= maxPendingBackendOpAttempts {
|
||||
xlog.Error("Reconciler: abandoning pending backend op after max attempts",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID,
|
||||
"attempts", op.Attempts+1, "last_error", applyErr)
|
||||
if err := rc.registry.DeletePendingBackendOp(ctx, op.ID); err != nil {
|
||||
xlog.Warn("Reconciler: failed to delete abandoned op row", "id", op.ID, "error", err)
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
_ = rc.registry.RecordPendingBackendOpFailure(ctx, op.ID, applyErr.Error())
|
||||
xlog.Warn("Reconciler: pending backend op retry failed",
|
||||
"op", op.Op, "backend", op.Backend, "node", op.NodeID, "attempts", op.Attempts+1, "error", applyErr)
|
||||
}
|
||||
}
|
||||
|
||||
// maxPendingBackendOpAttempts caps how many times the reconciler retries a
|
||||
// failing row before dead-lettering it. Ten attempts at exponential backoff
|
||||
// (30s → 15m cap) is >1h of wall-clock patience — well past any transient
|
||||
// worker restart or network blip. Poisoned rows beyond that are almost
|
||||
// certainly structural (wrong node type, non-existent gallery entry) and no
|
||||
// amount of further retrying will help.
|
||||
const maxPendingBackendOpAttempts = 10
|
||||
|
||||
// probeLoadedModels gRPC-health-checks model addresses that the DB says are
|
||||
// loaded. If a model's backend process is gone (OOM, crash, manual restart)
|
||||
// we remove the row so ghosts don't linger. Only probes rows older than
|
||||
// probeStaleAfter so we don't hammer every worker every tick for models we
|
||||
// just heard from.
|
||||
func (rc *ReplicaReconciler) probeLoadedModels(ctx context.Context) {
|
||||
var stale []NodeModel
|
||||
cutoff := time.Now().Add(-rc.probeStaleAfter)
|
||||
err := rc.registry.db.WithContext(ctx).
|
||||
Joins("JOIN backend_nodes ON backend_nodes.id = node_models.node_id").
|
||||
Where("node_models.state = ? AND backend_nodes.status = ? AND node_models.updated_at < ? AND node_models.address != ''",
|
||||
"loaded", StatusHealthy, cutoff).
|
||||
Find(&stale).Error
|
||||
if err != nil {
|
||||
xlog.Warn("Reconciler: failed to list loaded models for probe", "error", err)
|
||||
return
|
||||
}
|
||||
for _, m := range stale {
|
||||
if err := ctx.Err(); err != nil {
|
||||
return
|
||||
}
|
||||
if rc.prober.IsAlive(ctx, m.Address) {
|
||||
// Bump updated_at so we don't probe this row again immediately.
|
||||
_ = rc.registry.db.WithContext(ctx).Model(&NodeModel{}).
|
||||
Where("id = ?", m.ID).Update("updated_at", time.Now()).Error
|
||||
continue
|
||||
}
|
||||
if err := rc.registry.RemoveNodeModel(ctx, m.NodeID, m.ModelName); err != nil {
|
||||
xlog.Warn("Reconciler: failed to remove unreachable model", "node", m.NodeID, "model", m.ModelName, "error", err)
|
||||
continue
|
||||
}
|
||||
xlog.Warn("Reconciler: model unreachable, removed from registry",
|
||||
"node", m.NodeID, "model", m.ModelName, "address", m.Address)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -239,3 +239,164 @@ var _ = Describe("ReplicaReconciler", func() {
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
// fakeProber lets tests control whether a model's gRPC address "responds".
|
||||
type fakeProber struct {
|
||||
alive map[string]bool
|
||||
calls int
|
||||
}
|
||||
|
||||
func (f *fakeProber) IsAlive(_ context.Context, address string) bool {
|
||||
f.calls++
|
||||
if f.alive == nil {
|
||||
return false
|
||||
}
|
||||
return f.alive[address]
|
||||
}
|
||||
|
||||
var _ = Describe("ReplicaReconciler — state reconciliation", func() {
|
||||
var (
|
||||
db *gorm.DB
|
||||
registry *NodeRegistry
|
||||
)
|
||||
|
||||
BeforeEach(func() {
|
||||
if runtime.GOOS == "darwin" {
|
||||
Skip("testcontainers requires Docker, not available on macOS CI")
|
||||
}
|
||||
db = testutil.SetupTestDB()
|
||||
var err error
|
||||
registry, err = NewNodeRegistry(db)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
})
|
||||
|
||||
Describe("probeLoadedModels", func() {
|
||||
It("removes loaded models whose gRPC address is unreachable", func() {
|
||||
node := &BackendNode{Name: "n1", NodeType: NodeTypeBackend, Address: "10.0.0.1:50051"}
|
||||
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
|
||||
// Two loaded models — one stale (will probe), one fresh (skipped).
|
||||
stale := &NodeModel{
|
||||
ID: "stale-1",
|
||||
NodeID: node.ID,
|
||||
ModelName: "stale-model",
|
||||
Address: "10.0.0.1:12345",
|
||||
State: "loaded",
|
||||
UpdatedAt: time.Now().Add(-5 * time.Minute),
|
||||
}
|
||||
fresh := &NodeModel{
|
||||
ID: "fresh-1",
|
||||
NodeID: node.ID,
|
||||
ModelName: "fresh-model",
|
||||
Address: "10.0.0.1:54321",
|
||||
State: "loaded",
|
||||
UpdatedAt: time.Now(), // within probeStaleAfter
|
||||
}
|
||||
Expect(db.Create(stale).Error).To(Succeed())
|
||||
Expect(db.Create(fresh).Error).To(Succeed())
|
||||
|
||||
prober := &fakeProber{alive: map[string]bool{"10.0.0.1:12345": false}}
|
||||
rc := NewReplicaReconciler(ReplicaReconcilerOptions{
|
||||
Registry: registry,
|
||||
DB: db,
|
||||
Prober: prober,
|
||||
ProbeStaleAfter: 2 * time.Minute,
|
||||
})
|
||||
|
||||
rc.probeLoadedModels(context.Background())
|
||||
|
||||
// Stale was unreachable — row removed.
|
||||
var after []NodeModel
|
||||
Expect(db.Find(&after).Error).To(Succeed())
|
||||
Expect(after).To(HaveLen(1))
|
||||
Expect(after[0].ModelName).To(Equal("fresh-model"))
|
||||
// Prober was only called once (the fresh row was filtered out).
|
||||
Expect(prober.calls).To(Equal(1))
|
||||
})
|
||||
|
||||
It("keeps reachable models and bumps their updated_at", func() {
|
||||
node := &BackendNode{Name: "n1", NodeType: NodeTypeBackend, Address: "10.0.0.1:50051"}
|
||||
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
|
||||
stale := &NodeModel{
|
||||
ID: "stale-2",
|
||||
NodeID: node.ID,
|
||||
ModelName: "alive-model",
|
||||
Address: "10.0.0.1:12345",
|
||||
State: "loaded",
|
||||
UpdatedAt: time.Now().Add(-5 * time.Minute),
|
||||
}
|
||||
Expect(db.Create(stale).Error).To(Succeed())
|
||||
|
||||
prober := &fakeProber{alive: map[string]bool{"10.0.0.1:12345": true}}
|
||||
rc := NewReplicaReconciler(ReplicaReconcilerOptions{
|
||||
Registry: registry,
|
||||
DB: db,
|
||||
Prober: prober,
|
||||
ProbeStaleAfter: 2 * time.Minute,
|
||||
})
|
||||
|
||||
rc.probeLoadedModels(context.Background())
|
||||
|
||||
var after NodeModel
|
||||
Expect(db.First(&after, "id = ?", "stale-2").Error).To(Succeed())
|
||||
Expect(after.UpdatedAt).To(BeTemporally("~", time.Now(), time.Second))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("UpsertPendingBackendOp + RecordPendingBackendOpFailure", func() {
|
||||
It("upserts on the composite key rather than duplicating rows", func() {
|
||||
node := &BackendNode{Name: "n1", NodeType: NodeTypeBackend, Address: "10.0.0.1:50051"}
|
||||
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
|
||||
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), node.ID, "foo", OpBackendDelete, nil)).To(Succeed())
|
||||
// Second call for the same (node, backend, op) should not create a
|
||||
// new row — that's how re-issuing a delete works.
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), node.ID, "foo", OpBackendDelete, nil)).To(Succeed())
|
||||
|
||||
var rows []PendingBackendOp
|
||||
Expect(db.Find(&rows).Error).To(Succeed())
|
||||
Expect(rows).To(HaveLen(1))
|
||||
})
|
||||
|
||||
It("increments attempts and moves next_retry_at out on failure", func() {
|
||||
node := &BackendNode{Name: "n1", NodeType: NodeTypeBackend, Address: "10.0.0.1:50051"}
|
||||
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), node.ID, "foo", OpBackendDelete, nil)).To(Succeed())
|
||||
|
||||
var row PendingBackendOp
|
||||
Expect(db.First(&row).Error).To(Succeed())
|
||||
before := row.NextRetryAt
|
||||
|
||||
Expect(registry.RecordPendingBackendOpFailure(context.Background(), row.ID, "boom")).To(Succeed())
|
||||
Expect(db.First(&row, row.ID).Error).To(Succeed())
|
||||
Expect(row.Attempts).To(Equal(1))
|
||||
Expect(row.LastError).To(Equal("boom"))
|
||||
Expect(row.NextRetryAt).To(BeTemporally(">", before))
|
||||
})
|
||||
})
|
||||
|
||||
Describe("NewNodeRegistry malformed-row pruning", func() {
|
||||
It("drops queue rows for agent nodes and non-existent nodes on startup", func() {
|
||||
agent := &BackendNode{Name: "agent-1", NodeType: NodeTypeAgent, Address: "x"}
|
||||
Expect(registry.Register(context.Background(), agent, true)).To(Succeed())
|
||||
backend := &BackendNode{Name: "backend-1", NodeType: NodeTypeBackend, Address: "y"}
|
||||
Expect(registry.Register(context.Background(), backend, true)).To(Succeed())
|
||||
|
||||
// Three rows: one for a valid backend node (should survive),
|
||||
// one for an agent node (pruned), one for an empty backend name
|
||||
// on the valid node (pruned).
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "foo", OpBackendInstall, nil)).To(Succeed())
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), agent.ID, "foo", OpBackendInstall, nil)).To(Succeed())
|
||||
Expect(registry.UpsertPendingBackendOp(context.Background(), backend.ID, "", OpBackendInstall, nil)).To(Succeed())
|
||||
|
||||
// Re-instantiating the registry runs the cleanup migration.
|
||||
_, err := NewNodeRegistry(db)
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
|
||||
var rows []PendingBackendOp
|
||||
Expect(db.Find(&rows).Error).To(Succeed())
|
||||
Expect(rows).To(HaveLen(1))
|
||||
Expect(rows[0].NodeID).To(Equal(backend.ID))
|
||||
Expect(rows[0].Backend).To(Equal("foo"))
|
||||
})
|
||||
})
|
||||
})
|
||||
|
||||
@@ -104,6 +104,36 @@ type NodeWithExtras struct {
|
||||
Labels map[string]string `json:"labels,omitempty"`
|
||||
}
|
||||
|
||||
// PendingBackendOp is a durable intent for a backend lifecycle operation
|
||||
// (delete/install/upgrade) that needs to eventually apply on a specific node.
|
||||
//
|
||||
// Without this table, a backend delete against an offline node silently
|
||||
// dropped: the frontend skipped the node, the node came back later with the
|
||||
// backend still installed, and the operator saw a zombie. Now the intent is
|
||||
// recorded regardless of node status; the state reconciler drains the queue
|
||||
// whenever a node is healthy and removes the row on success. Reissuing the
|
||||
// same operation while a row exists updates NextRetryAt instead of stacking
|
||||
// duplicates (see the unique index).
|
||||
type PendingBackendOp struct {
|
||||
ID uint `gorm:"primaryKey;autoIncrement" json:"id"`
|
||||
NodeID string `gorm:"index;size:36;not null;uniqueIndex:idx_pending_backend_op,priority:1" json:"node_id"`
|
||||
Backend string `gorm:"index;size:255;not null;uniqueIndex:idx_pending_backend_op,priority:2" json:"backend"`
|
||||
Op string `gorm:"size:16;not null;uniqueIndex:idx_pending_backend_op,priority:3" json:"op"`
|
||||
Galleries []byte `gorm:"type:bytea" json:"-"` // serialized JSON for install/upgrade retries
|
||||
Attempts int `gorm:"default:0" json:"attempts"`
|
||||
LastError string `gorm:"type:text" json:"last_error,omitempty"`
|
||||
CreatedAt time.Time `json:"created_at"`
|
||||
NextRetryAt time.Time `gorm:"index" json:"next_retry_at"`
|
||||
}
|
||||
|
||||
// Op constants mirror the operation names used by DistributedBackendManager
|
||||
// so callers don't repeat stringly-typed values.
|
||||
const (
|
||||
OpBackendDelete = "delete"
|
||||
OpBackendInstall = "install"
|
||||
OpBackendUpgrade = "upgrade"
|
||||
)
|
||||
|
||||
// NodeRegistry manages backend node registration and lookup in PostgreSQL.
|
||||
type NodeRegistry struct {
|
||||
db *gorm.DB
|
||||
@@ -114,10 +144,34 @@ type NodeRegistry struct {
|
||||
// when multiple instances (frontend + workers) start at the same time.
|
||||
func NewNodeRegistry(db *gorm.DB) (*NodeRegistry, error) {
|
||||
if err := advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
|
||||
return db.AutoMigrate(&BackendNode{}, &NodeModel{}, &NodeLabel{}, &ModelSchedulingConfig{})
|
||||
return db.AutoMigrate(&BackendNode{}, &NodeModel{}, &NodeLabel{}, &ModelSchedulingConfig{}, &PendingBackendOp{})
|
||||
}); err != nil {
|
||||
return nil, fmt.Errorf("migrating node tables: %w", err)
|
||||
}
|
||||
|
||||
// One-shot cleanup of queue rows that can never drain: ops targeted at
|
||||
// agent workers (wrong subscription set), at non-existent nodes, or with
|
||||
// an empty backend name. The guard in enqueueAndDrainBackendOp prevents
|
||||
// new ones from being written, but rows persisted by earlier versions
|
||||
// keep the reconciler busy retrying a permanently-failing NATS request
|
||||
// every 30s. Guarded by the same migration advisory lock so only one
|
||||
// frontend runs it.
|
||||
_ = advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
|
||||
res := db.Exec(`
|
||||
DELETE FROM pending_backend_ops
|
||||
WHERE backend = ''
|
||||
OR node_id NOT IN (SELECT id FROM backend_nodes WHERE node_type = ? OR node_type = '')
|
||||
`, NodeTypeBackend)
|
||||
if res.Error != nil {
|
||||
xlog.Warn("Failed to prune malformed pending_backend_ops rows", "error", res.Error)
|
||||
return res.Error
|
||||
}
|
||||
if res.RowsAffected > 0 {
|
||||
xlog.Info("Pruned pending_backend_ops rows (wrong node type or empty backend)", "count", res.RowsAffected)
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
return &NodeRegistry{db: db}, nil
|
||||
}
|
||||
|
||||
@@ -946,3 +1000,114 @@ func (r *NodeRegistry) ApplyAutoLabels(ctx context.Context, nodeID string, node
|
||||
_ = r.SetNodeLabel(ctx, nodeID, "node.name", node.Name)
|
||||
}
|
||||
}
|
||||
|
||||
// UpsertPendingBackendOp records or refreshes a pending backend operation for
|
||||
// a node. If a row already exists for (nodeID, backend, op) we keep its
|
||||
// Attempts/LastError but reset NextRetryAt to now, so reissuing the same
|
||||
// delete/upgrade nudges it to the front of the queue instead of stacking a
|
||||
// duplicate intent.
|
||||
func (r *NodeRegistry) UpsertPendingBackendOp(ctx context.Context, nodeID, backend, op string, galleries []byte) error {
|
||||
row := PendingBackendOp{
|
||||
NodeID: nodeID,
|
||||
Backend: backend,
|
||||
Op: op,
|
||||
Galleries: galleries,
|
||||
NextRetryAt: time.Now(),
|
||||
}
|
||||
return r.db.WithContext(ctx).Clauses(clause.OnConflict{
|
||||
Columns: []clause.Column{{Name: "node_id"}, {Name: "backend"}, {Name: "op"}},
|
||||
DoUpdates: clause.AssignmentColumns([]string{"galleries", "next_retry_at"}),
|
||||
}).Create(&row).Error
|
||||
}
|
||||
|
||||
// ListDuePendingBackendOps returns queued ops whose NextRetryAt has passed
|
||||
// AND whose node is currently healthy. The reconciler drains this list; we
|
||||
// filter by node status in the query so a tick doesn't hammer NATS for
|
||||
// nodes that obviously can't answer.
|
||||
func (r *NodeRegistry) ListDuePendingBackendOps(ctx context.Context) ([]PendingBackendOp, error) {
|
||||
var ops []PendingBackendOp
|
||||
err := r.db.WithContext(ctx).
|
||||
Joins("JOIN backend_nodes ON backend_nodes.id = pending_backend_ops.node_id").
|
||||
Where("pending_backend_ops.next_retry_at <= ? AND backend_nodes.status = ?", time.Now(), StatusHealthy).
|
||||
Order("pending_backend_ops.next_retry_at ASC").
|
||||
Find(&ops).Error
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("listing due pending backend ops: %w", err)
|
||||
}
|
||||
return ops, nil
|
||||
}
|
||||
|
||||
// ListPendingBackendOps returns every queued row (for the UI "pending on N
|
||||
// nodes" chip and the pre-delete ConfirmDialog).
|
||||
func (r *NodeRegistry) ListPendingBackendOps(ctx context.Context) ([]PendingBackendOp, error) {
|
||||
var ops []PendingBackendOp
|
||||
if err := r.db.WithContext(ctx).Order("backend ASC, created_at ASC").Find(&ops).Error; err != nil {
|
||||
return nil, fmt.Errorf("listing pending backend ops: %w", err)
|
||||
}
|
||||
return ops, nil
|
||||
}
|
||||
|
||||
// DeletePendingBackendOp removes a queue row — called after the op succeeds.
|
||||
func (r *NodeRegistry) DeletePendingBackendOp(ctx context.Context, id uint) error {
|
||||
if err := r.db.WithContext(ctx).Delete(&PendingBackendOp{}, id).Error; err != nil {
|
||||
return fmt.Errorf("deleting pending backend op %d: %w", id, err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// RecordPendingBackendOpFailure bumps Attempts, captures the error, and
|
||||
// pushes NextRetryAt out with exponential backoff capped at 15 minutes.
|
||||
func (r *NodeRegistry) RecordPendingBackendOpFailure(ctx context.Context, id uint, errMsg string) error {
|
||||
return r.db.WithContext(ctx).Transaction(func(tx *gorm.DB) error {
|
||||
var row PendingBackendOp
|
||||
if err := tx.First(&row, id).Error; err != nil {
|
||||
return err
|
||||
}
|
||||
row.Attempts++
|
||||
row.LastError = errMsg
|
||||
row.NextRetryAt = time.Now().Add(backoffForAttempt(row.Attempts))
|
||||
return tx.Save(&row).Error
|
||||
})
|
||||
}
|
||||
|
||||
// backoffForAttempt is exponential from 30s doubling up to a 15m cap. The
|
||||
// reconciler tick is 30s so anything shorter would just re-fire immediately.
|
||||
func backoffForAttempt(attempts int) time.Duration {
|
||||
const cap = 15 * time.Minute
|
||||
base := 30 * time.Second
|
||||
shift := attempts - 1
|
||||
if shift < 0 {
|
||||
shift = 0
|
||||
}
|
||||
if shift > 10 { // 2^10 * 30s already exceeds the cap
|
||||
shift = 10
|
||||
}
|
||||
d := base << shift
|
||||
if d > cap {
|
||||
return cap
|
||||
}
|
||||
return d
|
||||
}
|
||||
|
||||
// CountPendingBackendOpsByBackend returns a map of backend name to the count
|
||||
// of pending rows. Used to decorate Manage → Backends with a "pending on N
|
||||
// nodes" chip without exposing the full queue.
|
||||
func (r *NodeRegistry) CountPendingBackendOpsByBackend(ctx context.Context) (map[string]int, error) {
|
||||
type row struct {
|
||||
Backend string
|
||||
Count int
|
||||
}
|
||||
var rows []row
|
||||
err := r.db.WithContext(ctx).Model(&PendingBackendOp{}).
|
||||
Select("backend, COUNT(*) as count").
|
||||
Group("backend").
|
||||
Scan(&rows).Error
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("counting pending backend ops: %w", err)
|
||||
}
|
||||
out := make(map[string]int, len(rows))
|
||||
for _, r := range rows {
|
||||
out[r.Backend] = r.Count
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
@@ -504,7 +504,7 @@ func (r *SmartRouter) installBackendOnNode(ctx context.Context, node *BackendNod
|
||||
return "", fmt.Errorf("no NATS connection for backend installation")
|
||||
}
|
||||
|
||||
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON)
|
||||
reply, err := r.unloader.InstallBackend(node.ID, backendType, modelID, r.galleriesJSON, "", "", "")
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
@@ -244,7 +244,7 @@ type fakeUnloader struct {
|
||||
unloadErr error
|
||||
}
|
||||
|
||||
func (f *fakeUnloader) InstallBackend(_, _, _, _ string) (*messaging.BackendInstallReply, error) {
|
||||
func (f *fakeUnloader) InstallBackend(_, _, _, _, _, _, _ string) (*messaging.BackendInstallReply, error) {
|
||||
return f.installReply, f.installErr
|
||||
}
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ type backendStopRequest struct {
|
||||
// NodeCommandSender abstracts NATS-based commands to worker nodes.
|
||||
// Used by HTTP endpoint handlers to avoid coupling to the concrete RemoteUnloaderAdapter.
|
||||
type NodeCommandSender interface {
|
||||
InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error)
|
||||
InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error)
|
||||
DeleteBackend(nodeID, backendName string) (*messaging.BackendDeleteReply, error)
|
||||
ListBackends(nodeID string) (*messaging.BackendListReply, error)
|
||||
StopBackend(nodeID, backend string) error
|
||||
@@ -72,7 +72,7 @@ func (a *RemoteUnloaderAdapter) UnloadRemoteModel(modelName string) error {
|
||||
// The worker installs the backend from gallery (if not already installed),
|
||||
// starts the gRPC process, and replies when ready.
|
||||
// Timeout: 5 minutes (gallery install can take a while).
|
||||
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON string) (*messaging.BackendInstallReply, error) {
|
||||
func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, galleriesJSON, uri, name, alias string) (*messaging.BackendInstallReply, error) {
|
||||
subject := messaging.SubjectNodeBackendInstall(nodeID)
|
||||
xlog.Info("Sending NATS backend.install", "nodeID", nodeID, "backend", backendType, "modelID", modelID)
|
||||
|
||||
@@ -80,6 +80,9 @@ func (a *RemoteUnloaderAdapter) InstallBackend(nodeID, backendType, modelID, gal
|
||||
Backend: backendType,
|
||||
ModelID: modelID,
|
||||
BackendGalleries: galleriesJSON,
|
||||
URI: uri,
|
||||
Name: name,
|
||||
Alias: alias,
|
||||
}, 5*time.Minute)
|
||||
}
|
||||
|
||||
|
||||
@@ -14,11 +14,13 @@ LocalAI provides endpoints to monitor and manage running backends. The `/backend
|
||||
|
||||
### Request
|
||||
|
||||
The request body is JSON:
|
||||
The model to monitor is passed as a query parameter:
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
|-----------|----------|----------|--------------------------------|
|
||||
| `model` | `string` | Yes | Name of the model to monitor |
|
||||
| Parameter | Type | Required | Location | Description |
|
||||
|-----------|----------|----------|----------|--------------------------------|
|
||||
| `model` | `string` | Yes | query | Name of the model to monitor |
|
||||
|
||||
For backwards compatibility, a JSON body with the same field is still accepted when the `model` query parameter is not set, but new clients should use the query parameter.
|
||||
|
||||
### Response
|
||||
|
||||
@@ -42,9 +44,7 @@ If the gRPC status call fails, the endpoint falls back to local process metrics:
|
||||
### Usage
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/backend/monitor \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "my-model"}'
|
||||
curl "http://localhost:8080/backend/monitor?model=my-model"
|
||||
```
|
||||
|
||||
### Example response
|
||||
|
||||
@@ -130,6 +130,19 @@ Reference for system information commands and diagnostics.
|
||||
|
||||
---
|
||||
|
||||
### 🤖 [AI Coding Assistants](ai-coding-assistants.md)
|
||||
Policy for AI-assisted contributions — licensing, DCO, and attribution.
|
||||
|
||||
**Key topics:**
|
||||
- Aligned with the Linux kernel's AI assistants policy
|
||||
- Signed-off-by and DCO rules
|
||||
- `Assisted-by` commit trailer format
|
||||
- Scope and responsibility of the human submitter
|
||||
|
||||
**Recommended for:** Contributors using AI coding assistants (Claude, Copilot, Cursor, Codex, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
| Task | Documentation |
|
||||
@@ -138,6 +151,7 @@ Reference for system information commands and diagnostics.
|
||||
| CLI commands | [CLI Reference](cli-reference.md) |
|
||||
| Check compatibility | [Compatibility Table](compatibility-table.md) |
|
||||
| System diagnostics | [System Info](system-info.md) |
|
||||
| Contribute with AI assistance | [AI Coding Assistants](ai-coding-assistants.md) |
|
||||
|
||||
---
|
||||
|
||||
|
||||
79
docs/content/reference/ai-coding-assistants.md
Normal file
79
docs/content/reference/ai-coding-assistants.md
Normal file
@@ -0,0 +1,79 @@
|
||||
|
||||
+++
|
||||
disableToc = false
|
||||
title = "AI Coding Assistants"
|
||||
weight = 28
|
||||
+++
|
||||
|
||||
This document provides guidance for AI tools and developers using AI assistance when contributing to LocalAI.
|
||||
|
||||
**LocalAI follows the same guidelines as the Linux kernel project for AI-assisted contributions.** See the upstream policy here: <https://docs.kernel.org/process/coding-assistants.html>. The rules below mirror that policy, adapted to LocalAI's license and project layout.
|
||||
|
||||
AI tools helping with LocalAI development should follow the standard project development process:
|
||||
|
||||
- [CONTRIBUTING.md](https://github.com/mudler/LocalAI/blob/master/CONTRIBUTING.md) — development workflow, commit conventions, and PR guidelines
|
||||
- [AGENTS.md](https://github.com/mudler/LocalAI/blob/master/AGENTS.md) — the agent entry point with links to all detailed topic guides
|
||||
- [.agents/ai-coding-assistants.md](https://github.com/mudler/LocalAI/blob/master/.agents/ai-coding-assistants.md) — the full policy source of truth
|
||||
|
||||
## Licensing and Legal Requirements
|
||||
|
||||
All contributions must comply with LocalAI's licensing requirements:
|
||||
|
||||
- LocalAI is licensed under the **MIT License**
|
||||
- New source files should use the SPDX license identifier `MIT` where applicable to the file type
|
||||
- Contributions must be compatible with the MIT License and must not introduce code under incompatible licenses (e.g., GPL) without an explicit discussion with maintainers
|
||||
|
||||
## Signed-off-by and Developer Certificate of Origin
|
||||
|
||||
**AI agents MUST NOT add `Signed-off-by` tags.** Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for:
|
||||
|
||||
- Reviewing all AI-generated code
|
||||
- Ensuring compliance with licensing requirements
|
||||
- Adding their own `Signed-off-by` tag (when the project requires DCO) to certify the contribution
|
||||
- Taking full responsibility for the contribution
|
||||
|
||||
AI agents MUST NOT add `Co-Authored-By` trailers for themselves either. A human reviewer owns the contribution; the AI's involvement is recorded via `Assisted-by` (see below).
|
||||
|
||||
## Attribution
|
||||
|
||||
When AI tools contribute to LocalAI development, proper attribution helps track the evolving role of AI in the development process. Contributions should include an `Assisted-by` tag in the commit message trailer in the following format:
|
||||
|
||||
```
|
||||
Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- `AGENT_NAME` — name of the AI tool or framework (e.g., `Claude`, `Copilot`, `Cursor`)
|
||||
- `MODEL_VERSION` — specific model version used (e.g., `claude-opus-4-7`, `gpt-5`)
|
||||
- `[TOOL1] [TOOL2]` — optional specialized analysis tools invoked by the agent (e.g., `golangci-lint`, `staticcheck`, `go vet`)
|
||||
|
||||
Basic development tools (git, go, make, editors) should **not** be listed.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
fix(llama-cpp): handle empty tool call arguments
|
||||
|
||||
Previously the parser panicked when the model returned a tool call with
|
||||
an empty arguments object. Fall back to an empty JSON object in that
|
||||
case so downstream consumers receive a valid payload.
|
||||
|
||||
Assisted-by: Claude:claude-opus-4-7 golangci-lint
|
||||
Signed-off-by: Jane Developer <jane@example.com>
|
||||
```
|
||||
|
||||
## Scope and Responsibility
|
||||
|
||||
Using an AI assistant does not reduce the contributor's responsibility. The human submitter must:
|
||||
|
||||
- Understand every line that lands in the PR
|
||||
- Verify that generated code compiles, passes tests, and follows the project style
|
||||
- Confirm that any referenced APIs, flags, or file paths actually exist in the current tree (AI models may hallucinate identifiers)
|
||||
- Not submit AI output verbatim without review
|
||||
|
||||
Reviewers may ask for clarification on any change regardless of how it was produced. "An AI wrote it" is not an acceptable answer to a design question.
|
||||
|
||||
{{% notice note %}}
|
||||
This policy is a living document. If you're unsure how to apply it to a specific contribution, open an issue or ask in the [Discord channel](https://discord.gg/uJAeKSAGDy) before submitting.
|
||||
{{% /notice %}}
|
||||
@@ -33,7 +33,7 @@ LocalAI will attempt to automatically load models which are not explicitly confi
|
||||
|---------|-------------|-------------|
|
||||
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
|
||||
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
|
||||
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, Metal |
|
||||
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
|
||||
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
|
||||
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
|
||||
|
||||
@@ -1,4 +1,206 @@
|
||||
---
|
||||
- name: "qwen3.6-35b-a3b-claude-4.6-opus-reasoning-distilled"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
|
||||
description: |
|
||||
# 🔥 Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
|
||||
|
||||
A reasoning SFT fine-tune of `Qwen/Qwen3.6-35B-A3B` on chain-of-thought (CoT) distillation mostly sourced from Claude Opus 4.6. The goal is to preserve Qwen3.6's strong agentic coding and reasoning base while nudging the model toward structured Claude Opus-style reasoning traces and more stable long-form problem solving.
|
||||
|
||||
The training path is text-only. The Qwen3.6 base architecture includes a vision encoder, but this fine-tuning run did not train on image or video examples.
|
||||
|
||||
- **Developed by:** @hesamation
|
||||
- **Base model:** `Qwen/Qwen3.6-35B-A3B`
|
||||
- **License:** apache-2.0
|
||||
|
||||
This fine-tuning run is inspired by Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, including the notebook/training workflow style and Claude Opus reasoning-distillation direction.
|
||||
|
||||
[](https://x.com/Hesamation) [](https://discord.gg/vtJykN3t)
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
The MMLU-Pro pass used 70 total questions per model: `--limit 5` across 14 MMLU-Pro subjects. Treat this as a smoke/comparative check, not a release-quality full benchmark.
|
||||
|
||||
...
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- qwen
|
||||
- reasoning
|
||||
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
min_p: 0
|
||||
model: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
presence_penalty: 1.5
|
||||
repeat_penalty: 1
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
sha256: fd3bf7586354890a2710d69357c30fb221a31eecf9f3cd9418257d9289e02765
|
||||
uri: https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF/resolve/main/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled.Q4_K_M.gguf
|
||||
- name: "qwen3.5-9b-glm5.1-distill-v1"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF
|
||||
description: |
|
||||
# 🪐 Qwen3.5-9B-GLM5.1-Distill-v1
|
||||
|
||||
## 📌 Model Overview
|
||||
|
||||
**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`
|
||||
**Base Model:** Qwen3.5-9B
|
||||
**Training Type:** Supervised Fine-Tuning (SFT, Distillation)
|
||||
**Parameter Scale:** 9B
|
||||
**Training Framework:** Unsloth
|
||||
|
||||
This model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.
|
||||
|
||||
The primary goals are to:
|
||||
|
||||
- Improve **structured reasoning ability**
|
||||
- Enhance **instruction-following consistency**
|
||||
- Activate **latent knowledge via better reasoning structure**
|
||||
|
||||
## 📊 Training Data
|
||||
|
||||
### Main Dataset
|
||||
|
||||
- `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`
|
||||
- Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.
|
||||
- Generated from a **GLM-5.1 teacher model**
|
||||
- Approximately **700x** the scale of `Qwen3.5-reasoning-700x`
|
||||
- Training used a **filtered subset**, not the full source dataset.
|
||||
|
||||
### Auxiliary Dataset
|
||||
|
||||
- `Jackrong/Qwen3.5-reasoning-700x`
|
||||
|
||||
...
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- qwen
|
||||
- instruction-tuned
|
||||
- reasoning
|
||||
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
mmproj: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
min_p: 0
|
||||
model: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
presence_penalty: 1.5
|
||||
repeat_penalty: 1
|
||||
temperature: 0.7
|
||||
top_k: 20
|
||||
top_p: 0.8
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
sha256: f6f1d2b8efb2339ce9d4dd0f0329d2f2e4cf765eda49aa3f6df8f629f871a151
|
||||
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf
|
||||
- filename: llama-cpp/mmproj/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/mmproj.gguf
|
||||
sha256: e42c1c2ed0eaf6ea88a6ba10b26b4adf00a96a8c3d1803534a4c41060ad9e86b
|
||||
uri: https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1-GGUF/resolve/main/mmproj.gguf
|
||||
- name: "supergemma4-26b-uncensored-v2"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2
|
||||
description: |
|
||||
Hugging Face |
|
||||
GitHub |
|
||||
Launch Blog |
|
||||
Documentation
|
||||
|
||||
License: Apache 2.0 | Authors: Google DeepMind
|
||||
|
||||
Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages.
|
||||
|
||||
Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI.
|
||||
|
||||
Gemma 4 introduces key **capability and architectural advancements**:
|
||||
|
||||
* **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes.
|
||||
|
||||
...
|
||||
license: "gemma"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
icon: https://ai.google.dev/gemma/images/gemma4_banner.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
model: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/supergemma4-26b-uncensored-gguf-v2/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
sha256: e773b0a209d48524f9d485bca0818247f75d7ddde7cce951367a7e441fb59137
|
||||
uri: https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2/resolve/main/supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf
|
||||
- name: "qwopus-glm-18b-merged"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
- https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF
|
||||
description: "# \U0001FA90 Qwen3.5-9B-GLM5.1-Distill-v1\n\n## \U0001F4CC Model Overview\n\n**Model Name:** `Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1`\n**Base Model:** Qwen3.5-9B\n**Training Type:** Supervised Fine-Tuning (SFT, Distillation)\n**Parameter Scale:** 9B\n**Training Framework:** Unsloth\n\nThis model is a distilled variant of **Qwen3.5-9B**, trained on high-quality reasoning data derived from **GLM-5.1**.\n\nThe primary goals are to:\n\n - Improve **structured reasoning ability**\n - Enhance **instruction-following consistency**\n - Activate **latent knowledge via better reasoning structure**\n\n## \U0001F4CA Training Data\n\n### Main Dataset\n\n - `Jackrong/GLM-5.1-Reasoning-1M-Cleaned`\n - Cleaned from the original `Kassadin88/GLM-5.1-1000000x` dataset.\n - Generated from a **GLM-5.1 teacher model**\n - Approximately **700x** the scale of `Qwen3.5-reasoning-700x`\n - Training used a **filtered subset**, not the full source dataset.\n\n### Auxiliary Dataset\n\n - `Jackrong/Qwen3.5-reasoning-700x`\n\n...\n"
|
||||
license: "apache-2.0"
|
||||
tags:
|
||||
- llm
|
||||
- gguf
|
||||
- reasoning
|
||||
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/BnSg_x99v9bG9T5-8sKa1.png
|
||||
overrides:
|
||||
backend: llama-cpp
|
||||
function:
|
||||
automatic_tool_parsing_fallback: true
|
||||
grammar:
|
||||
disable: true
|
||||
known_usecases:
|
||||
- chat
|
||||
options:
|
||||
- use_jinja:true
|
||||
parameters:
|
||||
model: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
template:
|
||||
use_tokenizer_template: true
|
||||
files:
|
||||
- filename: llama-cpp/models/Qwopus-GLM-18B-Merged-GGUF/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
sha256: 13bd039f95c9ea46ef1d75905faa7be6ca4e47a5af9d4cf62e298a738a5b195f
|
||||
uri: https://huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF/resolve/main/Qwopus-GLM-18B-Healed-Q4_K_M.gguf
|
||||
- name: "qwen3.6-35b-a3b-apex"
|
||||
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
|
||||
urls:
|
||||
@@ -887,6 +1089,8 @@
|
||||
- gpu
|
||||
overrides:
|
||||
backend: neutts
|
||||
parameters:
|
||||
model: neuphonic/neutts-air
|
||||
known_usecases:
|
||||
- tts
|
||||
- name: vllm-omni-z-image-turbo
|
||||
@@ -15186,14 +15390,16 @@
|
||||
- gpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: wan2.1-t2v-1.3B-Q8_0.gguf
|
||||
model: wan2.1_t2v_1.3b-q8_0.gguf
|
||||
files:
|
||||
- filename: "wan2.1-t2v-1.3B-Q8_0.gguf"
|
||||
uri: "huggingface://calcuis/wan-gguf/wan2.1-t2v-1.3B-Q8_0.gguf"
|
||||
- filename: "wan2.1_t2v_1.3b-q8_0.gguf"
|
||||
sha256: "8f10260cc26498fee303851ee1c2047918934125731b9b78d4babfce4ec27458"
|
||||
uri: "huggingface://calcuis/wan-gguf/wan2.1_t2v_1.3b-q8_0.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- name: wan-2.1-i2v-14b-480p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
@@ -15214,11 +15420,103 @@
|
||||
model: wan2.1-i2v-14b-480p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-i2v-14b-480p-Q4_K_M.gguf"
|
||||
sha256: "d91f7139acadb42ea05cdf97b311e5099f714f11fbe4d90916500e2f53cbba82"
|
||||
uri: "huggingface://city96/Wan2.1-I2V-14B-480P-gguf/wan2.1-i2v-14b-480p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
|
||||
- name: wan-2.1-flf2v-14b-720p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
description: |
|
||||
Wan 2.1 FLF2V 14B 720P — first-last-frame-to-video diffusion, GGUF Q4_K_M.
|
||||
Takes a start and end reference image and interpolates a 33-frame clip
|
||||
between them. Unlike the plain I2V variant this model feeds the end
|
||||
frame through clip_vision as well, so it conditions semantically (not
|
||||
just in pixel-space) on both endpoints. That makes it the right choice
|
||||
for seamless loops (start_image == end_image) and clean narrative cuts.
|
||||
Native 720p but accepts 480p resolutions; shares the same VAE, t5xxl
|
||||
text encoder, and clip_vision_h as I2V 14B.
|
||||
urls:
|
||||
- https://huggingface.co/city96/Wan2.1-FLF2V-14B-720P-gguf
|
||||
tags:
|
||||
- image-to-video
|
||||
- first-last-frame-to-video
|
||||
- wan
|
||||
- video-generation
|
||||
- cpu
|
||||
- gpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: wan2.1-flf2v-14b-720p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
|
||||
sha256: "7652d7d8b0795009ff21ed83d806af762aae8a8faa8640dd07b3a67e4dfab445"
|
||||
uri: "huggingface://city96/Wan2.1-FLF2V-14B-720P-gguf/wan2.1-flf2v-14b-720p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
sha256: 2521d4de0bf9e1cc6549866463ceae85e4ec3239bc6063f7488810be39033bbc
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/clip_vision/clip_vision_h.safetensors"
|
||||
- name: wan-2.1-i2v-14b-720p-ggml
|
||||
license: apache-2.0
|
||||
url: "github:mudler/LocalAI/gallery/wan-ggml.yaml@master"
|
||||
description: |
|
||||
Wan 2.1 I2V 14B 720P — image-to-video diffusion, GGUF Q4_K_M.
|
||||
Native 720p sibling of the 480p I2V model: animates a single
|
||||
reference image into a 33-frame clip at up to 1280x720. Trained
|
||||
purely as image-to-video (no first-last-frame interpolation path),
|
||||
so motion is freer and better-suited to single-anchor animation
|
||||
than repurposing the FLF2V 720P variant for i2v. Shares the same
|
||||
VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B 480P
|
||||
and FLF2V 14B 720P entries.
|
||||
urls:
|
||||
- https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf
|
||||
tags:
|
||||
- image-to-video
|
||||
- wan
|
||||
- video-generation
|
||||
- cpu
|
||||
- gpu
|
||||
overrides:
|
||||
parameters:
|
||||
model: wan2.1-i2v-14b-720p-Q4_K_M.gguf
|
||||
options:
|
||||
- "clip_vision_path:clip_vision_h.safetensors"
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "flow_shift:3.0"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
files:
|
||||
- filename: "wan2.1-i2v-14b-720p-Q4_K_M.gguf"
|
||||
sha256: "ffecd91e4b636d8e3e43f3fa388218158ba447109547bde777c6d67ef4fe42a4"
|
||||
uri: "huggingface://city96/Wan2.1-I2V-14B-720P-gguf/wan2.1-i2v-14b-720p-Q4_K_M.gguf"
|
||||
- filename: "wan_2.1_vae.safetensors"
|
||||
uri: "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors"
|
||||
- filename: "umt5-xxl-encoder-Q8_0.gguf"
|
||||
uri: "huggingface://city96/umt5-xxl-encoder-gguf/umt5-xxl-encoder-Q8_0.gguf"
|
||||
- filename: "clip_vision_h.safetensors"
|
||||
|
||||
@@ -9,11 +9,6 @@ config_file: |
|
||||
- "diffusion_model"
|
||||
- "vae_decode_only:false"
|
||||
- "sampler:euler"
|
||||
- "scheduler:discrete"
|
||||
- "flow_shift:3.0"
|
||||
- "diffusion_flash_attn:true"
|
||||
- "offload_params_to_cpu:true"
|
||||
- "keep_vae_on_cpu:true"
|
||||
- "keep_clip_on_cpu:true"
|
||||
- "t5xxl_path:umt5-xxl-encoder-Q8_0.gguf"
|
||||
- "vae_path:wan_2.1_vae.safetensors"
|
||||
|
||||
45
go.mod
45
go.mod
@@ -8,13 +8,13 @@ require (
|
||||
github.com/Masterminds/sprig/v3 v3.3.0
|
||||
github.com/alecthomas/kong v1.14.0
|
||||
github.com/anthropics/anthropic-sdk-go v1.27.0
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1
|
||||
github.com/charmbracelet/glamour v1.0.0
|
||||
github.com/containerd/containerd v1.7.30
|
||||
github.com/coreos/go-oidc/v3 v3.17.0
|
||||
github.com/containerd/containerd v1.7.31
|
||||
github.com/coreos/go-oidc/v3 v3.18.0
|
||||
github.com/dhowden/tag v0.0.0-20240417053706-3d75831295e8
|
||||
github.com/ebitengine/purego v0.10.0
|
||||
github.com/emirpasic/gods/v2 v2.0.0-alpha
|
||||
@@ -35,7 +35,7 @@ require (
|
||||
github.com/lithammer/fuzzysearch v1.1.8
|
||||
github.com/mholt/archiver/v3 v3.5.1
|
||||
github.com/microcosm-cc/bluemonday v1.0.27
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0
|
||||
github.com/mudler/cogito v0.9.5-0.20260315222927-63abdec7189b
|
||||
github.com/mudler/edgevpn v0.31.1
|
||||
github.com/mudler/go-processmanager v0.1.0
|
||||
@@ -75,24 +75,23 @@ require (
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 // indirect
|
||||
github.com/aws/smithy-go v1.24.2 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 // indirect
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 // indirect
|
||||
github.com/aws/smithy-go v1.25.0 // indirect
|
||||
github.com/bahlo/generic-list-go v0.2.0 // indirect
|
||||
github.com/buger/jsonparser v1.1.1 // indirect
|
||||
github.com/go-jose/go-jose/v4 v4.1.3 // indirect
|
||||
github.com/go-jose/go-jose/v4 v4.1.4 // indirect
|
||||
github.com/jinzhu/inflection v1.0.0 // indirect
|
||||
github.com/jinzhu/now v1.1.5 // indirect
|
||||
github.com/mattn/go-sqlite3 v1.14.24 // indirect
|
||||
|
||||
94
go.sum
94
go.sum
@@ -70,44 +70,42 @@ github.com/anthropics/anthropic-sdk-go v1.27.0 h1:0CWbmBq5ofGAjF2H6lefCNRbnaUMGi
|
||||
github.com/anthropics/anthropic-sdk-go v1.27.0/go.mod h1:qUKmaW+uuPB64iy1l+4kOSvaLqPXnHTTBKH6RVZ7q5Q=
|
||||
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
|
||||
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5 h1:dj5kopbwUsVUVFgO4Fi5BIT3t4WyqIDjGKCangnV/yY=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.5/go.mod h1:mwsPRE8ceUUpiTgF7QmQIJ7lgsKUPQOUl3o72QBrE1o=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7 h1:3kGOqnh1pPeddVa/E37XNTaWJ8W6vrbYV9lJEkCnhuY=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.7/go.mod h1:lyw7GFp3qENLh7kwzf7iMzAxDn+NzjXEAGjKS2UOKqI=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14 h1:opVIRo/ZbbI8OIqSOKmpFaY7IwfFUOCCXBsUpJOwDdI=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.14/go.mod h1:U4/V0uKxh0Tl5sxmCBZ3AecYny4UNlVmObYjKuuaiOo=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14 h1:n+UcGWAIZHkXzYt87uMFBv/l8THYELoX6gVcUvgl6fI=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.14/go.mod h1:cJKuyWB59Mqi0jM3nFYQRmnHVQIcgoxjEMAbLkpr62w=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21 h1:NUS3K4BTDArQqNu2ih7yeDLaS3bmHD0YndtA6UP884g=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.21/go.mod h1:YWNWJQNjKigKY1RHVJCuupeWDrrHjRqHm0N9rdrWzYI=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21 h1:Rgg6wvjjtX8bNHcvi9OnXWwcE0a2vGpbwmtICOsvcf4=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.21/go.mod h1:A/kJFst/nm//cyqonihbdpQZwiUhhzpqTsdbhDdRF9c=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21 h1:PEgGVtPoB6NTpPrBgqSE5hE/o47Ij9qk/SEZFbUOe9A=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.21/go.mod h1:p+hz+PRAYlY3zcpJhPwXlLC4C+kqn70WIHwnzAfs6ps=
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6 h1:qYQ4pzQ2Oz6WpQ8T3HvGHnZydA72MnLuFK9tJwmrbHw=
|
||||
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.6/go.mod h1:O3h0IK87yXci+kg6flUKzJnWeziQUKciKrLjcatSNcY=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21 h1:SwGMTMLIlvDNyhMteQ6r8IJSBPlRdXX5d4idhIGbkXA=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.21/go.mod h1:UUxgWxofmOdAMuqEsSppbDtGKLfR04HGsD0HXzvhI1k=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7 h1:5EniKhLZe4xzL7a+fU3C2tfUN4nWIqlLesfrjkuPFTY=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.7/go.mod h1:x0nZssQ3qZSnIcePWLvcoFisRXJzcTVvYpAAdYX8+GI=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12 h1:qtJZ70afD3ISKWnoX3xB0J2otEqu3LqicRcDBqsj0hQ=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.12/go.mod h1:v2pNpJbRNl4vEUWEh5ytQok0zACAKfdmKS51Hotc3pQ=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21 h1:c31//R3xgIJMSC8S6hEVq+38DcvUlgFY0FM6mSI5oto=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.21/go.mod h1:r6+pf23ouCB718FUxaqzZdbpYFyDtehyZcmP5KL9FkA=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20 h1:siU1A6xjUZ2N8zjTHSXFhB9L/2OY8Dqs0xXiLjF30jA=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.20/go.mod h1:4TLZCmVJDM3FOu5P5TJP0zOlu9zWgDWU7aUxWbr+rcw=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1 h1:csi9NLpFZXb9fxY7rS1xVzgPRGMt7MSNWeQ6eo247kE=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.97.1/go.mod h1:qXVal5H0ChqXP63t6jze5LmFalc7+ZE7wOdLtZ0LCP0=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9 h1:QKZH0S178gCmFEgst8hN0mCX1KxLgHBKKY/CLqwP8lg=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.9/go.mod h1:7yuQJoT+OoH8aqIxw9vwF+8KpvLZ8AWmvmUWHsGQZvI=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15 h1:lFd1+ZSEYJZYvv9d6kXzhkZu07si3f+GQ1AaYwa2LUM=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.15/go.mod h1:WSvS1NLr7JaPunCXqpJnWk1Bjo7IxzZXrZi1QQCkuqM=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19 h1:dzztQ1YmfPrxdrOiuZRMF6fuOwWlWpD2StNLTceKpys=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.19/go.mod h1:YO8TrYtFdl5w/4vmjL8zaBSsiNp3w0L1FfKVKenZT7w=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10 h1:p8ogvvLugcR/zLBXTXrTkj0RYBUdErbMnAFFp12Lm/U=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.41.10/go.mod h1:60dv0eZJfeVXfbT1tFJinbHrDfSJ2GZl4Q//OSSNAVw=
|
||||
github.com/aws/smithy-go v1.24.2 h1:FzA3bu/nt/vDvmnkg+R8Xl46gmzEDam6mZ1hzmwXFng=
|
||||
github.com/aws/smithy-go v1.24.2/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6 h1:1AX0AthnBQzMx1vbmir3Y4WsnJgiydmnJjiLu+LvXOg=
|
||||
github.com/aws/aws-sdk-go-v2 v1.41.6/go.mod h1:dy0UzBIfwSeot4grGvY1AqFWN5zgziMmWGzysDnHFcQ=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9 h1:adBsCIIpLbLmYnkQU+nAChU5yhVTvu5PerROm+/Kq2A=
|
||||
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.9/go.mod h1:uOYhgfgThm/ZyAuJGNQ5YgNyOlYfqnGpTHXvk3cpykg=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16 h1:Q0iQ7quUgJP0F/SCRTieScnaMdXr9h/2+wze1u3cNeM=
|
||||
github.com/aws/aws-sdk-go-v2/config v1.32.16/go.mod h1:duCCnJEFqpt2RC6no1iK6q+8HpwOAkiUua0pY507dQc=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15 h1:fyvgWTszojq8hEnMi8PPBTvZdTtEVmAVyo+NFLHBhH4=
|
||||
github.com/aws/aws-sdk-go-v2/credentials v1.19.15/go.mod h1:gJiYyMOjNg8OEdRWOf3CrFQxM2a98qmrtjx1zuiQfB8=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22 h1:IOGsJ1xVWhsi+ZO7/NW8OuZZBtMJLZbk4P5HDjJO0jQ=
|
||||
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.22/go.mod h1:b+hYdbU+jGKfXE8kKM6g1+h+L/Go3vMvzlxBsiuGsxg=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22 h1:GmLa5Kw1ESqtFpXsx5MmC84QWa/ZrLZvlJGa2y+4kcQ=
|
||||
github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.22/go.mod h1:6sW9iWm9DK9YRpRGga/qzrzNLgKpT2cIxb7Vo2eNOp0=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22 h1:dY4kWZiSaXIzxnKlj17nHnBcXXBfac6UlsAx2qL6XrU=
|
||||
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.22/go.mod h1:KIpEUx0JuRZLO7U6cbV204cWAEco2iC3l061IxlwLtI=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23 h1:FPXsW9+gMuIeKmz7j6ENWcWtBGTe1kH8r9thNt5Uxx4=
|
||||
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.23/go.mod h1:7J8iGMdRKk6lw2C+cMIphgAnT8uTwBwNOsGkyOCm80U=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8 h1:HtOTYcbVcGABLOVuPYaIihj6IlkqubBwFj10K5fxRek=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.8/go.mod h1:VsK9abqQeGlzPgUr+isNWzPlK2vKe9INMLWnY65f5Xs=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14 h1:xnvDEnw+pnj5mctWiYuFbigrEzSm35x7k4KS/ZkCANg=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.14/go.mod h1:yS5rNogD8e0Wu9+l3MUwr6eENBzEeGejvINpN5PAYfY=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22 h1:PUmZeJU6Y1Lbvt9WFuJ0ugUK2xn6hIWUBBbKuOWF30s=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.13.22/go.mod h1:nO6egFBoAaoXze24a2C0NjQCvdpk8OueRoYimvEB9jo=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22 h1:SE+aQ4DEqG53RRCAIHlCf//B2ycxGH7jFkpnAh/kKPM=
|
||||
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.19.22/go.mod h1:ES3ynECd7fYeJIL6+oax+uIEljmfps0S70BaQzbMd/o=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1 h1:kU/eBN5+MWNo/LcbNa4hWDdN76hdcd7hocU5kvu7IsU=
|
||||
github.com/aws/aws-sdk-go-v2/service/s3 v1.99.1/go.mod h1:Fw9aqhJicIVee1VytBBjH+l+5ov6/PhbtIK/u3rt/ls=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10 h1:a1Fq/KXn75wSzoJaPQTgZO0wHGqE9mjFnylnqEPTchA=
|
||||
github.com/aws/aws-sdk-go-v2/service/signin v1.0.10/go.mod h1:p6+MXNxW7IA6dMgHfTAzljuwSKD0NCm/4lbS4t6+7vI=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16 h1:x6bKbmDhsgSZwv6q19wY/u3rLk/3FGjJWyqKcIRufpE=
|
||||
github.com/aws/aws-sdk-go-v2/service/sso v1.30.16/go.mod h1:CudnEVKRtLn0+3uMV0yEXZ+YZOKnAtUJ5DmDhilVnIw=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20 h1:oK/njaL8GtyEihkWMD4k3VgHCT64RQKkZwh0DG5j8ak=
|
||||
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.35.20/go.mod h1:JHs8/y1f3zY7U5WcuzoJ/yAYGYtNIVPKLIbp61euvmg=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0 h1:ks8KBcZPh3PYISr5dAiXCM5/Thcuxk8l+PG4+A0exds=
|
||||
github.com/aws/aws-sdk-go-v2/service/sts v1.42.0/go.mod h1:pFw33T0WLvXU3rw1WBkpMlkgIn54eCB5FYLhjDc9Foo=
|
||||
github.com/aws/smithy-go v1.25.0 h1:Sz/XJ64rwuiKtB6j98nDIPyYrV1nVNJ4YU74gttcl5U=
|
||||
github.com/aws/smithy-go v1.25.0/go.mod h1:YE2RhdIuDbA5E5bTdciG9KrW3+TiEONeUWCqxX9i1Fc=
|
||||
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
|
||||
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
|
||||
github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
|
||||
@@ -198,8 +196,8 @@ github.com/cloudflare/circl v1.6.1/go.mod h1:uddAzsPgqdMAYatqJ0lsjX1oECcQLIlRpzZ
|
||||
github.com/cncf/udpa/go v0.0.0-20191209042840-269d4d468f6f/go.mod h1:M8M6+tZqaGXZJjfX53e64911xZQV5JYwmTeXPW+k8Sc=
|
||||
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
|
||||
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
|
||||
github.com/containerd/containerd v1.7.30 h1:/2vezDpLDVGGmkUXmlNPLCCNKHJ5BbC5tJB5JNzQhqE=
|
||||
github.com/containerd/containerd v1.7.30/go.mod h1:fek494vwJClULlTpExsmOyKCMUAbuVjlFsJQc4/j44M=
|
||||
github.com/containerd/containerd v1.7.31 h1:jn3IMuTV4Bb1Uwb0MFPW2ASJAD3W1lh6QqqZHIZwDh4=
|
||||
github.com/containerd/containerd v1.7.31/go.mod h1:jdwD6s/BhV4XVJGrvtziNPVA+83n66TwptVaPKprq4E=
|
||||
github.com/containerd/continuity v0.4.4 h1:/fNVfTJ7wIl/YPMHjf+5H32uFhl63JucB34PlCpMKII=
|
||||
github.com/containerd/continuity v0.4.4/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
|
||||
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=
|
||||
@@ -212,8 +210,8 @@ github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpS
|
||||
github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw=
|
||||
github.com/containerd/stargz-snapshotter/estargz v0.18.2 h1:yXkZFYIzz3eoLwlTUZKz2iQ4MrckBxJjkmD16ynUTrw=
|
||||
github.com/containerd/stargz-snapshotter/estargz v0.18.2/go.mod h1:XyVU5tcJ3PRpkA9XS2T5us6Eg35yM0214Y+wvrZTBrY=
|
||||
github.com/coreos/go-oidc/v3 v3.17.0 h1:hWBGaQfbi0iVviX4ibC7bk8OKT5qNr4klBaCHVNvehc=
|
||||
github.com/coreos/go-oidc/v3 v3.17.0/go.mod h1:wqPbKFrVnE90vty060SB40FCJ8fTHTxSwyXJqZH+sI8=
|
||||
github.com/coreos/go-oidc/v3 v3.18.0 h1:V9orjXynvu5wiC9SemFTWnG4F45v403aIcjWo0d41+A=
|
||||
github.com/coreos/go-oidc/v3 v3.18.0/go.mod h1:DYCf24+ncYi+XkIH97GY1+dqoRlbaSI26KVTCI9SrY4=
|
||||
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d/go.mod h1:F5haX7vjVVG0kc13fIWeqUViNPyEJxv/OmvnBo0Yme4=
|
||||
github.com/coreos/go-systemd/v22 v22.5.0/go.mod h1:Y58oyj3AT4RCenI/lSvhwexgC+NSVTIJ3seZv2GcEnc=
|
||||
github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA=
|
||||
@@ -336,8 +334,8 @@ github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71 h1:5BVwOaUSBTlVZowGO6VZGw
|
||||
github.com/go-gl/gl v0.0.0-20231021071112-07e5d0ea2e71/go.mod h1:9YTyiznxEY1fVinfM7RvRcjRHbw2xLBJ3AAGIT0I4Nw=
|
||||
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a h1:vxnBhFDDT+xzxf1jTJKMKZw3H0swfWk9RpWbBbDK5+0=
|
||||
github.com/go-gl/glfw/v3.3/glfw v0.0.0-20240506104042-037f3cc74f2a/go.mod h1:tQ2UAYgL5IevRw8kRxooKSPJfGvJ9fJQFa0TUsXzTg8=
|
||||
github.com/go-jose/go-jose/v4 v4.1.3 h1:CVLmWDhDVRa6Mi/IgCgaopNosCaHz7zrMeF9MlZRkrs=
|
||||
github.com/go-jose/go-jose/v4 v4.1.3/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
|
||||
github.com/go-jose/go-jose/v4 v4.1.4 h1:moDMcTHmvE6Groj34emNPLs/qtYXRVcd6S7NHbHz3kA=
|
||||
github.com/go-jose/go-jose/v4 v4.1.4/go.mod h1:x4oUasVrzR7071A4TnHLGSPpNOm2a21K9Kf04k1rs08=
|
||||
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
|
||||
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
|
||||
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
|
||||
@@ -385,8 +383,8 @@ github.com/gofrs/flock v0.13.0/go.mod h1:jxeyy9R1auM5S6JYDBhDt+E2TCo7DkratH4Pgi8
|
||||
github.com/gogo/protobuf v1.1.1/go.mod h1:r8qH/GZQm5c6nD/R0oafs1akxWv10x8SbQlK7atdtwQ=
|
||||
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
|
||||
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
|
||||
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
|
||||
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
|
||||
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
|
||||
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
|
||||
@@ -691,8 +689,8 @@ github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g
|
||||
github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28=
|
||||
github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ=
|
||||
github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc=
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1 h1:M4x9GyIPj+HoIlHNGpK2hq5o3BFhC+78PkEaldQRphc=
|
||||
github.com/modelcontextprotocol/go-sdk v1.4.1/go.mod h1:Bo/mS87hPQqHSRkMv4dQq1XCu6zv4INdXnFZabkNU6s=
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0 h1:CHU0FIX9kpueNkxuYtfYQn1Z0slhFzBZuq+x6IiblIU=
|
||||
github.com/modelcontextprotocol/go-sdk v1.5.0/go.mod h1:gggDIhoemhWs3BGkGwd1umzEXCEMMvAnhTrnbXJKKKA=
|
||||
github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd h1:TRLaZ9cD/w8PVh93nsPXa1VrQ6jlwL5oN8l14QlcNfg=
|
||||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q=
|
||||
|
||||
@@ -159,7 +159,6 @@ var _ = Describe("CapabilityFilterDisabled", func() {
|
||||
os.Setenv(capabilityEnv, "disable")
|
||||
s := &SystemState{}
|
||||
Expect(s.IsBackendCompatible("cuda12-whisperx", "quay.io/nvidia-cuda-12")).To(BeTrue())
|
||||
Expect(s.IsBackendCompatible("rocm-whisperx", "quay.io/rocm")).To(BeTrue())
|
||||
Expect(s.IsBackendCompatible("metal-whisperx", "quay.io/metal-darwin")).To(BeTrue())
|
||||
Expect(s.IsBackendCompatible("intel-whisperx", "quay.io/intel-sycl")).To(BeTrue())
|
||||
Expect(s.IsBackendCompatible("cpu-whisperx", "quay.io/cpu")).To(BeTrue())
|
||||
|
||||
@@ -985,13 +985,11 @@ const docTemplate = `{
|
||||
"summary": "Backend monitor endpoint",
|
||||
"parameters": [
|
||||
{
|
||||
"description": "Backend statistics request",
|
||||
"name": "request",
|
||||
"in": "body",
|
||||
"required": true,
|
||||
"schema": {
|
||||
"$ref": "#/definitions/schema.BackendMonitorRequest"
|
||||
}
|
||||
"type": "string",
|
||||
"description": "Name of the model to monitor",
|
||||
"name": "model",
|
||||
"in": "query",
|
||||
"required": true
|
||||
}
|
||||
],
|
||||
"responses": {
|
||||
@@ -2408,6 +2406,23 @@ const docTemplate = `{
|
||||
}
|
||||
}
|
||||
},
|
||||
"gallery.NodeDriftInfo": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"digest": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_id": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_name": {
|
||||
"type": "string"
|
||||
},
|
||||
"version": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
},
|
||||
"gallery.UpgradeInfo": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
@@ -2425,6 +2440,13 @@ const docTemplate = `{
|
||||
},
|
||||
"installed_version": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_drift": {
|
||||
"description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/gallery.NodeDriftInfo"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
@@ -982,13 +982,11 @@
|
||||
"summary": "Backend monitor endpoint",
|
||||
"parameters": [
|
||||
{
|
||||
"description": "Backend statistics request",
|
||||
"name": "request",
|
||||
"in": "body",
|
||||
"required": true,
|
||||
"schema": {
|
||||
"$ref": "#/definitions/schema.BackendMonitorRequest"
|
||||
}
|
||||
"type": "string",
|
||||
"description": "Name of the model to monitor",
|
||||
"name": "model",
|
||||
"in": "query",
|
||||
"required": true
|
||||
}
|
||||
],
|
||||
"responses": {
|
||||
@@ -2405,6 +2403,23 @@
|
||||
}
|
||||
}
|
||||
},
|
||||
"gallery.NodeDriftInfo": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"digest": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_id": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_name": {
|
||||
"type": "string"
|
||||
},
|
||||
"version": {
|
||||
"type": "string"
|
||||
}
|
||||
}
|
||||
},
|
||||
"gallery.UpgradeInfo": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
@@ -2422,6 +2437,13 @@
|
||||
},
|
||||
"installed_version": {
|
||||
"type": "string"
|
||||
},
|
||||
"node_drift": {
|
||||
"description": "NodeDrift lists nodes whose installed version or digest differs from\nthe cluster majority. Non-empty means the cluster has diverged and an\nupgrade will realign it. Empty in single-node mode.",
|
||||
"type": "array",
|
||||
"items": {
|
||||
"$ref": "#/definitions/gallery.NodeDriftInfo"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
|
||||
@@ -157,6 +157,17 @@ definitions:
|
||||
type: string
|
||||
type: array
|
||||
type: object
|
||||
gallery.NodeDriftInfo:
|
||||
properties:
|
||||
digest:
|
||||
type: string
|
||||
node_id:
|
||||
type: string
|
||||
node_name:
|
||||
type: string
|
||||
version:
|
||||
type: string
|
||||
type: object
|
||||
gallery.UpgradeInfo:
|
||||
properties:
|
||||
available_digest:
|
||||
@@ -169,6 +180,14 @@ definitions:
|
||||
type: string
|
||||
installed_version:
|
||||
type: string
|
||||
node_drift:
|
||||
description: |-
|
||||
NodeDrift lists nodes whose installed version or digest differs from
|
||||
the cluster majority. Non-empty means the cluster has diverged and an
|
||||
upgrade will realign it. Empty in single-node mode.
|
||||
items:
|
||||
$ref: '#/definitions/gallery.NodeDriftInfo'
|
||||
type: array
|
||||
type: object
|
||||
galleryop.OpStatus:
|
||||
properties:
|
||||
@@ -2363,12 +2382,11 @@ paths:
|
||||
/backend/monitor:
|
||||
get:
|
||||
parameters:
|
||||
- description: Backend statistics request
|
||||
in: body
|
||||
name: request
|
||||
- description: Name of the model to monitor
|
||||
in: query
|
||||
name: model
|
||||
required: true
|
||||
schema:
|
||||
$ref: '#/definitions/schema.BackendMonitorRequest'
|
||||
type: string
|
||||
responses:
|
||||
"200":
|
||||
description: Response
|
||||
|
||||
@@ -57,7 +57,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
|
||||
FlushNATS(infra.NC)
|
||||
|
||||
adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
|
||||
installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "")
|
||||
installReply, err := adapter.InstallBackend(node.ID, "llama-cpp", "", "", "", "", "")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(installReply.Success).To(BeTrue())
|
||||
})
|
||||
@@ -78,7 +78,7 @@ var _ = Describe("Node Backend Lifecycle (NATS-driven)", Label("Distributed"), f
|
||||
FlushNATS(infra.NC)
|
||||
|
||||
adapter := nodes.NewRemoteUnloaderAdapter(registry, infra.NC)
|
||||
installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "")
|
||||
installReply, err := adapter.InstallBackend(node.ID, "nonexistent", "", "", "", "", "")
|
||||
Expect(err).ToNot(HaveOccurred())
|
||||
Expect(installReply.Success).To(BeFalse())
|
||||
Expect(installReply.Error).To(ContainSubstring("backend not found"))
|
||||
|
||||
Reference in New Issue
Block a user